Universal fibronectin type III binding-domain libraries

ABSTRACT

Walk-through mutagenesis and natural-variant combinatorial fibronectin Type III (FN3) polypeptide libraries are described, along with their method of construction and use. Also disclosed are a number of high binding affinity polypeptides selected by screening the libraries against a variety of selected antigens.

This patent application claims priority to U.S. Provisional PatentApplication No. 60/955,334 filed Aug. 10, 2007 and U.S. ProvisionalPatent Application No. 61/075,107 filed Jun. 24, 2008, both of which areincorporated herein by reference in their entirety.

BACKGROUND OF THE INVENTION

Scaffold based binding proteins are becoming legitimate alternatives toantibodies in their ability to bind specific ligand targets. Thesescaffold binding proteins share the quality of having a stable frameworkcore that can tolerate multiple substitutions in the ligand bindingregions. Some scaffold frameworks have immunoglobulin like proteindomain architecture with loops extending from a beta sandwich core. Ascaffold framework core can then be synthetically engineered from whicha library of different sequence variants can be built upon. The sequencediversity is typically concentrated in the exterior surfaces of theproteins such as loop structures or other exterior surfaces that canserve as ligand binding regions.

Fibronectin Type III domain (FN3) was first identified as a one of therepeating domains in the fibronectin protein. The FN3 domain constitutesa small (˜94 amino acids), monomeric β-sandwich protein made up of sevenβ strands with three connecting loops. The three loops near theN-terminus of FN3, are functionally analogous to thecomplementarity-determining regions of immunoglobulin domains. FN3 looplibraries can then be engineered to bind to a variety of targets such ascytokines, growth factors and receptor molecules and other proteins.

One potential problem in creating these synthetic libraries is the highfrequency of unproductive variants leading therefore, to inefficientcandidate screens. For example, creating diversity in the variants ofteninvolves in vitro techniques such as random mutagenesis, saturationmutagenesis, error-prone PCR, and gene shuffling. These strategies areinherently stochastic and often require the construction of exceedinglylarge libraries to comprehensively explore sufficient sequencediversity. Additionally, there is no way to enumerate the number, whattype and where in the protein the mutations have occurred. Furthermore,these random strategies create indiscriminate substitutions that causeprotein architecture destabilization. It has been shown that improvementin one characteristic, such as affinity optimization, usually leads todecreased thermal stability when compared to the original proteinscaffold framework.

Accordingly, a need exists for a fibronectin binding domain library thatis systematic in construction. By bioinformatics led design, the loopcandidates are flexible for insertion into multiple FN3 scaffolds. Byspecific targeted loop substitutions, overall scaffold stability ismaximized while concurrently, non-immunogenic substitutions areminimized. Additionally, the library can be size tailored so that theoverall diversity can be readily screened in different systems.Furthermore, the representative diversity of the designed loops arestill capable of binding a number of pre-defined ligand targets.Moreover, the systematic design of loop still allows subsequent affinitymaturation of recovered binding clones.

SUMMARY OF THE INVENTION

In one aspect, the invention includes a natural-variant combinatoriallibrary of fibronectin Type 3 domain polypeptides useful in screeningfor the presence of one or more polypeptides having a selected bindingor enzymatic activity. The library polypeptides include (a) regions A,AB, B, C, CD, D, E, EF, F, and G having wildtype amino acid sequences ofa selected native fibronectin Type 3 polypeptide or polypeptides, and(b) loop regions BC, DE, and FG having selected lengths. At least oneselected loop region of a selected length contains a library ofnatural-variant combinatorial sequences expressed by a library of codingsequences that encode at each loop position, a conserved or selectedsemi-conserved consensus amino acid and, if the consensus amino acid hasa frequency of occurrence equal to or less than a selected thresholdfrequency of at least 50%, other natural variant amino acids, includingsemi-conserved amino acids and variable amino acids whose occurrencerate is above a selected minimum threshold occurrence at that position,or their chemical equivalents.

The library may have a given threshold is 100%, unless the loop aminoacid position contains only one dominant and one variant amino, and thedominant and variant amino are chemically similar amino acids, in whichcase the given threshold may be 90%. In this embodiment, the librarycontains all natural variants or their chemical equivalents having atleast some reasonable occurrence frequency, e.g., 10%, in the in theselected loop and loop position.

The natural-variant combinatorial sequences may be in a combination ofloops and loop lengths selected from loops BC and DE, BC and FG, and DEand FG loops, where the BC loop is selected from one of BC/11, BC/14,and BC/15, the DE loop is DE/6, and the FG loop is selected from one ofFG/8, and FG11.

The library may have at two of the loop combinations BC and DE, BC andFG, and DE and FG, beneficial mutations identified by screening anatural-variant combinatorial library containing amino acid variants inthe two loop combination, and at the third loop, identified by FG, DE,and BC, respectively, a library of natural variant combinatorialsequences at a third loop and loop length identified by BC/11, BC/14,and BC/15, DE/6, or FG/8, and FG11.

In one embodiment, the library may have the wildtype amino acidsequences in regions A, AB, B, C, CD, D, E, EF, F, and G of the 14^(th)fibronectin Type III module of human fibronecton. In another embodiment,the library may have the wildtype amino acid sequences in regions A, AB,B, C, CD, D, E, EF, F, and G of the 10^(th) fibronectin Type III moduleof human fibronecton.

A natural-variant combinatorial library may have the following sequencesfor the indicated loops and loop lengths: (a) BC loop length of 11, andthe amino acid sequence identified by SEQ ID NOS: 43 or 49; (b) BC looplength of 14, and the amino acid sequence identified by SEQ ID. NOS: 44or 50; (c) BC loop length of 15, and the amino acid sequence identifiedby SEQ ID. NOS: 45 or 51; (d) DE loop length of 6, and the amino acidsequence identified by SEQ ID. NOS: 46 or 52; (e) FG loop length of 8,and the amino acid sequence identified by SEQ ID. NOS: 47, for the firstN-terminal six amino acids, or SEQ ID NO:53, and (f) FG loop length of11, and the amino acid sequence identified by SEQ ID. NO: 48, for thefirst N-terminal nine amino acids, or SEQ ID NO:54.

The library of polypeptides may be encoded by an expression libraryselected from the group consisting of a ribosome display library, apolysome display library, a phage display library, a bacterialexpression library, and a yeast display library.

The libraries may be used in a method of identifying a polypeptidehaving a desired binding affinity, in which the natural-variantcombinatorial library are screened to select for an fibronectin bindingdomain having a desired binding affinity. In particular, it has beenfound that the natural-variant combinatorial library provideshigh-binding polypeptides with high efficiency for a number of antigentargets, such as FNFα. VEGF, and HMGB1.

The screening may involve, for example, contacting the fibronectinbinding domains with a target substrate, where the fibronectin bindingdomains being associated with the polynucleotide encoding thefibronectin binding domain. The method may further include identifyingFN3 polynucleotides that encode the selected fibronectin binding domain.

Also disclosed is an expression library of polynucleotides encoding theabove library of polypeptides, and produced by synthesizingpolynucleotides encoding one or more framework regions and one or moreloop regions wherein the polynucleotides are predetermined, wherein thepolynucleotides encoding said regions further comprise sufficientoverlapping sequence whereby the polynucleotide sequences, underpolymerase chain reaction (PCR) conditions, are capable of assembly intopolynucleotides encoding complete fibronectin binding domains.

In another aspect, the invention includes a walk-through mutagenesislibrary of fibronectin Type 3 domain polypeptides useful in screeningfor the presence of one or more polypeptides having a selected bindingor enzymatic activity. The library polypeptides include (a) regions A,AB, B, C, CD, D, E, EF, F, and G having wildtype amino acid sequences ofa selected native fibronectin Type 3 polypeptide or polypeptides, and(b) loop regions BC, DE, and FG having selected lengths. At least oneselected loop region of a selected length contains a library of walkthrough mutagenesis sequences expressed by a library of coding sequencesthat encode, at each loop position position, a conserved or selectedsemi-conserved consensus amino acid and, if the consensus amino acid hasa occurrence frequency equal to or less than a selected thresholdfrequency of at least 50%, a single common target amino acid and anyco-produced amino acids.

The given threshold frequency may be 100%, or a selected frequencybetween 50-100%. The loops and loop lengths in the library may beselected from the group consisting of BC/11, BC/14, BC/15, DE/6, FG/8,and FG11.

The library may have a library of walk-through mutagenesis sequencesformed at each of the loops and loop lengths selected from the groupconsisting of BC/11, BC/14, BC/15, DE/6, FG/8, and FG11, and for eachcommon target amino selected from the group consisting of lysine,glutamine, aspartic acid, tyrosine, leucine, praline, serine, histidine,and glycine.

In one embodiment, the library may have the wildtype amino acidsequences in regions A, AB, B, C, CD, D, E, EF, F, and G of the 14^(th)fibronectin Type III module of human fibronecton. In another embodiment,the library may have the wildtype amino acid sequences in regions A, AB,B, C, CD, D, E, EF, F, and G of the 10^(th) fibronectin Type III moduleof human fibronecton.

In another aspect, the invention includes a method of forming a libraryof fibronectin Type 3 domain polypeptides useful in screening for thepresence of one or more polypeptides having a selected binding orenzymatic activity. The method includes the steps of:

(i) aligning BC, DE, and FG amino acid loop sequences in a collection ofnative fibronectin Type 3 domain polypeptides,

(ii) segregating the aligned loop sequences according to loop length,

(iii) for a selected loop and loop length from step (ii), performingpositional amino acid frequency analysis to determine the frequencies ofamino acids at each loop position,

(iv) for each loop and loop length analyzed in step (iii), identifyingat each position a conserved or selected semi-conserved consensus aminoacid and other natural-variant amino acids,

(v) for at least one selected loop and loop length, forming:

(1) a library of walk-through mutagenesis sequences expressed by alibrary of coding sequences that encode, at each loop position, theconsensus amino acid, and if the consensus amino acid has a occurrencefrequency equal to or less than a selected threshold frequency of atleast 50%, a single common target amino acid and any co-produced aminoacids, or

(2) a library of natural-variant combinatorial sequences expressed by alibrary of coding sequences that encode at each loop position, aconsensus amino acid and, if the consensus amino acid has a frequency ofoccurrence equal to or less than a selected threshold frequency of atleast 50%, other natural variant amino acids, including semi-conservedamino acids and variable amino acids whose occurrence rate is above aselected minimum threshold occurrence at that position, or theirchemical equivalents,

(vi) incorporating the library of coding sequences into framework FN3coding sequences to form an FN3 expression library, and

(vi) expressing the FN3 polypeptides of the expression library.

The method may be employed in producing various type of walk-throughmutagenesis and natural-variant combinatorial libraries, such as thosedescribed above.

Also disclosed is a TNF-α binding protein having a K_(d) bindingconstant equal to or greater than 0.1 μM and having a sequence selectedfrom SEQ ID NOS: 55-63; a VEGF binding protein having a K_(d) bindingconstant equal to or greater than 0.1 μM and having a sequence selectedfrom SEQ ID NOS: 64-67; and an HMGB1 binding protein having a K_(d)binding constant equal to or greater than 0.1 μM and having a sequenceselected from SEQ ID NOS: 67-81.

These and other objects and features of the invention will become morefully apparent when the following detailed description is read inconjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a schematic diagram illustrating the method for constructing afibronectin binding domain libraries using computer assisted geneticdatabase biomining and delineation of beta-scaffold and loop structures.

FIG. 2A is a schematic representation of the FN3 binding domainillustrating the two antiparallel beta-sheets domain. One half iscomposed of beta strands (ABE) and the other half is composed of (CDFG).The 6 CDR like loops are also indicated: AB, BC, CD, DE, EF, and FG.Loops BC, DE and FG (dotted lines) are present at the N-terminus of theFN3 domain and are arranged to form ligand binding surfaces. The RGDsequence is located in the FG loop.

FIG. 2B shows is a ribbon diagram of the FN3 binding domain illustratingthe BC, DE and FG loops (dotted lines) are present at the N-terminus ofthe FN3 domain and are arranged to form ligand binding surfaces

FIGS. 3A and 3B are (3A) a ribbon diagram of the structural overlaycomparisons of the overall loop and beta-strand scaffolds between FN3module 10 and module 14, and (3B) structural overlay comparisons of theFG loop and F and G beta-strand boundaries between FN3 module 10, 13,and 14 indicating that the position of the loop acceptor sites arewell-conserved in the FN3 protein domain architecture even though theirrespective loops may be quite varied in topology.

FIG. 4 is a schematic representation of the FN3 loop and beta-strandamino acid numberings for the BC, DE and FG loop inserts (lightshading).

FIG. 5B shows amino acid sequence alignment of the 1^(st)-16^(th)fibronectin Typw III modules of human fibronectin, the location of threeloops: (BC, DE, and FG), and several highly conserved residues throughout the fibronectin binding domain including W22, Y/F32, V50, A57, A74,and I/L88 are indicated above the alignment. The conserved amino acidare used as landmarks to aid in alignments the FN3 module and tointroduce gaps where necessary.

FIG. 6 is a bar graph showing BC loop length diversity derived frombioinformatics analysis of all FN3 modules. BC loop lengths 11, 14 and15 are the predominant sizes seen in expressed FN3 sequences.

FIG. 7 is a bar graph showing DE loop length diversity derived frombioinformatics analysis of all FN3 modules. DE loop length 6 is thesingle most predominant size seen in expressed FN3 sequences.

FIG. 8 is a bar graph showing FG loop length diversity derived frombioinformatics analysis of all FN3 modules. FG loop lengths 8 and 11 arethe most predominant sizes seen in expressed FN3 sequences.

FIG. 9 shows sequence diversity of an exemplary loop region in the formof amino acid variability profile (frequency distribution) for BC looplength size 11.

FIG. 10 shows sequence diversity of an exemplary loop region in the formof amino acid variability profile (frequency distribution) for BC looplength size 14.

FIG. 11 shows sequence diversity of an exemplary loop region in the formof amino acid variability profile (frequency distribution) for BC looplength size 15.

FIG. 12 shows sequence diversity of an exemplary loop region in the formof amino acid variability profile (frequency distribution). DE looplength size 6.

FIG. 13 shows amino acid sequence diversity of FG loop region in theform of amino acid variability profile (frequency distribution) for FGloop length size 8.

FIG. 14 shows sequence diversity of FG loop region in the form of aminoacid variability profile (frequency distribution) for FG loop lengthsize 11.

FIGS. 15A-15I show the base fixed sequence and variable positions of aBC loop length size 11 and the amino acid matrix showing wild type, theWTM target positions and the extra potential diversity generated fromthe degenerate WTM codons for each of the selected amino acids K (15A),Q (15B), D (15C), Y (15D), L (15E), P (15F), S (15G), H (15H), and G(15I).

FIGS. 16A-16I show the base fixed sequence and variable positions of anBC loop length size 15 and the amino acid matrix showing wild type, theWTM target positions and the extra potential diversity generated fromthe degenerate WTM codons for each of the selected amino acids K (16A),Q (16B), D (16C), Y (16D), L (16E), P (16F), S (16G), H (16H), and G(16I).

FIGS. 17A-17I show the base fixed sequence and variable positions of anDE loop length size 6 and matrix showing wild type, the WTM targetpositions and the extra potential diversity generated from thedegenerate WTM codons for each of the selected amino acids K (17A), Q(17B), D (17C), Y (17D), L (17E), P (17F), S (17G), H (17H), and G(17I).

FIGS. 18A-18B show the base fixed sequence and variable positions of anFG loop length size 11 amino acids and matrix showing wild type, the WTMtarget positions and the extra potential diversity generated from thedegenerate WTM codons for each of the selected amino acids K (18A), Q(18B), D (18C), Y (18D), L (18E), P (18F), S (18G), H (18H), and G(18I).

FIG. 19 shows the degenerate and mixed base DNA oligonucleotidesequences for the fixed and variable positions of a BC loop length size15. The amino acid matrix showing wild type, the WTM target positionsand the extra potential diversity generated from the degenerate WTMcodons.

FIG. 20 shows construction of the FN3 binding domain library using acombination of overlapping nondegenerate and degenerate oligonucleotidesthat can be converted to double-stranded nucleic acids using the singleoverlap extension polymerase chain reaction (SOE-PCR). Eightoligonucleotides are required for the entire gene. Each loop (BC, DE,and FG) is encoded by a separate series of degenerate oligonucleotides.

FIG. 21 shows loop diversity by WTM in the BC, DE, and FG variablepositions and total fibronectin binding domain library size whencombining the different FN3 loops.

FIG. 22 shows the modular construction using different FN III bindingdomains of module 14 and Tenascin using their respective set ofoverlapping non-degenerate and degenerate oligonucleotides. The same BC,DE and FG loop diversity library can be placed into their respectivemodule 14 and Tenascin loop positions.

FIG. 23 shows the construction of the natural-variant amino acid libraryfor loop BC or length 11.

FIGS. 24A-24C show ELISA with three selected anti-TNFα 14FN3 variants,A6, C10, and C5 for binding to TNFα (light bars) and VEGF (dark bars)(24A); binding specificity of anti-TNFα 14 FN3 variants with respect toTNFα (light bars), control (dark bars) and VEGF (white bars) (24B); andsequences of anti-TNFα 14FN3 variants (24C).

FIGS. 25A-25C show binding specificity of anti-VEGF 14 FN3 variants withrespect to VEGF (light bars), TNFα (dark bars bars), control (whitebars) (25A); sequences of three anti-VEGF 14FN3 variants (25B); andOctet analysis of variant R1D4 (25C).

FIGS. 26A-26C show sequences of anti-HMGB114FN3 sequences (26A);specific binding by anti-HMGB1 variants with respect to HMGB1 (lightbars), TNF (dark bars), and control (white bars) (26B), and bindingkinetics of HMGB1 variants (26C).

DETAILED DESCRIPTION OF THE INVENTION I. Definitions

The terms below have the following meanings unless indicated otherwisein the specification:

“Fibronectin Type III (FN3) domain polypeptides” or “FN3 polypeptides”refer to polypeptides having the Fibronectin Type III domain or modulediscussed in Section II below, where one or more modules will make up afibronectin-type protein (FN3 protein), such as the sixteen differentFN3 modules making up human fibronectin (FN), and the 15 different FN3modules making up tenascin. Individual FN3 domain polypeptides arereferred to by module number and protein name, e.g., the 10^(th) or14^(th) module of human fibronectin (10/FN or 14/FN) or the 1^(st)module of tenascin (1/tenascin).

A “library” of FN3 polypeptides refers to a collection of FN3polypeptides having a selected sequence variation or diversity in atleast one of the BC, DE, and FG loops of a defined length (see SectionII below). The term “library” is also used to refer to the collection ofamino acid sequences within a selected BC, DE, or FG loop of a selkectedlength, and to the collection of coding sequences that encode loop orpolypeptide amino acid libraries.

A “universal FN3 library” refers to a FN3 polypeptide library in whichamino acid diversity in one or more of the BC, DE or FG loop regions isdetermined by or reflects the amino acid variants present in acollection of known FN3 sequences.

The term “conserved amino acid residue” or “fixed amino acid” refers toan amino acid residue determined to occur with a frequency that is high,typically at least 50% or more (e.g., at about 60%, 70%, 80%, 90%, 95%,or 100%), for a given residue position. When a given residue isdetermined to occur at such a high frequency, i.e., above a threshold ofabout 50%, it may be determined to be conserved and thus represented inthe libraries of the invention as a “fixed” or “constant” residue, atleast for that amino acid residue position in the loop region beinganalyzed.

The term “semi-conserved amino acid residue” refers to amino acidresidues determined to occur with a frequency that is high, for 2 to 3residues for a given residue position. When 2-3 residues, preferably 2residues, that together, are represented at a frequency of about 40% ofthe time or higher (e.g., 50%, 60%, 70%, 80%, 90% or higher), theresidues are determined to be semi-conserved and thus represented in thelibraries of the invention as a “semi-fixed” at least for that aminoacid residue position in the loop region being analyzed. Typically, anappropriate level of nucleic acid mutagenesis/variability is introducedfor a semi-conserved amino acid (codon) position such that the 2 to 3residues are properly represented. Thus, each of the 2 to 3 residues canbe said to be “semi-fixed” for this position. A “selected semi-conservedamino acid residue” is a selected one of the 2 or more semi-conservedamino acid residues, typically, but not necessarily, the residue havingthe highest occurrence frequency at that position.

The term “variable amino acid residue” refers to amino acid residuesdetermined to occur with a lower frequency (less than 20%) for a givenresidue position. When many residues appear at a given position, theresidue position is determined to be variable and thus represented inthe libraries of the invention as variable at least for that amino acidresidue position in the loop region being analyzed. Typically, anappropriate level of nucleic acid mutagenesis/variability is introducedfor a variable amino acid (codon) position such that an accuratespectrum of residues are properly represented. Of course, it isunderstood that, if desired, the consequences or variability of anyamino acid residue position, i.e., conserved, semi-conserved, orvariable, can be represented, explored or altered using, as appropriate,any of the mutagenesis methods disclosed herein, e.g., WTM andnatural-variant combinatorial libraries. A lower threshold frequency ofoccurrence of variable amino acids may be, for example, 5-10% or lower.Below this threshold, variable amino acids may be omitted from thenatural-variant amino acirds at that position.

A “consensus” amino acid in a BC, DE, or FG loop of an FN3 polypeptideis a conserved amino acid or a selected one of a semi-conserved aminoacids.

“Natural-variant amino acids” include conserved, semi-conserved, andvariable amino acid residues observed, in accordance with theiroccurrence frequencies, at a given position in a selected loop of aselected length. The natural-variant amino acids may be substituted bychemically equivalant amino acids, and may exclude variable amino acidresidues below a selected occurrence frequency, e.g., 5-10%, or aminoacid residues that are chemically equivalent to other natural-variantamino acids.

A “library of walk through mutagenesis sequences” refers to a library ofsequences within a selected FN3 loop and loop length which is expressedby a library of coding sequences that encode, at each loop position, aconserved or selected semi-conserved consensus amino acid and, if theconsensus amino acid has an occurrence frequency equal to or less than aselected threshold frequency of at least 50%, a single common targetamino acid and any co-produced amino acids. Thus, for each of targetamino acid, the library of walk-through mutagenesis sequences within agiven loop will contain the target amino acid at all combinations of oneto all positions within the loop at which the consensus amino acid hasan occurrence frequence equal to or less than the given thresholdfrequency. If this threshold frequency is set at 100%, each position inthe loop will be contain the target amino acid in at least one librarymember. The tem “library of walk-through mutagenesis sequences” alsoencompasses a mixture of walk-through mutagenesis libraries, one foreach target amino acids, e.g., each of nine different target aminoacids.

A “library of natural-variant combinatorial sequences” refers to alibrary of sequences within a selected FN3 loop and loop length which isexpressed by a library of coding sequences that encode at each loopposition, a conserved or selected semi-conserved consensus amino acidand, if the consensus amino acid has a frequency of occurrence equal toor less than a selected threshold frequency of at least 50%, othernatural variant amino acids, including semi-conserved amino acids andvariable amino acids whose occurrence rate is above a selected minimumthreshold occurrance at that position, or their chemical equivalents.Thus, for each amino acid position in a selected loop and loop length,the library of natural variant combinatorial sequences will contain theconsensus amino acid at that position plus other amino acid variantsidentified as having at least some minimum frequency at that position,e.g., at least 5-10% frequency, or chemically equivalent amino acids. Inaddition, natural variants may be substituted or dropped if the codingsequence for that amino acid produces a significant number ofco-produced amino acids, via codon degeneracy. The average number ofencoded amino acid variants in the loop region will typically between3-5, e.g., 4, for loops having a loop length of 10 or more, e.g., BC/11,BC/14, BC/15, and may have an average number of substitututions of 6 ormore for shorter loops, e.g., DE/6, FG/8 and FG/11, (where variationsoccurs only at six positions) such that the total diversity of a typicalloop region can be maintained in the range preferably about 104-10⁷ foran FN3 BC, DE, or FG loop, and the diversity for two of the three loopscan be maintained in the range of about 10¹² or less. It will beappreciated from Examples 9 and 10 below that the natural variants atany loop position can be limited to the topmost frequent 3-5 variants,where natural variants that are omitted are those for which the codonchange for that amino acid would also a lead to a significant number ofco-produced amino acids, where the variant is already represented in thesequence by a chemically equivalent amino acid, or where the frequencyof that amino acid in the sequence profile is relativiely low, e.g., 10%or less.

The term “framework region” refers to the art recognized portions of afibronectin beta-strand scaffold that exist between the more divergentloop regions. Such framework regions are typically referred to the betastrands A through G that collectively provide a scaffold for where thesix defined loops can extend to form a ligand contact surface(s). Infibronectin, the seven beta-strands orient themselves as two beta-pleatsto form a beta sandwich. The framework region may also include loops AB,CD, and EF between strands A and B, C and D, and E and F.Variable-sequence loops BC, DE, and FG may also be referred to aframework regions in which mutagnesis is introduced to create thedesired amino acid diversity within the region.

The term “ligand” or “antigen” refers to compounds which arestructurally/chemically similar in terms of their basic composition.Typical ligand classes are proteins (polypeptides), peptides,polysaccharides, polynucleotides, and small molecules. Ligand can beequivalent to “antigens” when recognized by specific antibodies.

The term “loop region” refers to a peptide sequence not assigned to thebeta-strand pleats. In the fibronectin binding scaffold there are sixloop regions, three of which are known to be involved in binding domainsof the scaffold (BC, DE, and FG), and three of which are located on theopposite sided of the polypeptide (AB, EF, and CD). In the presentinvention, sequence diviersity is built into one or more of the BC, DE,and FG loops, whereas the AB, CD, and EF loops are generally assignedthe wildtype amino sequences of the FN3 polypeptide from which otherframework regions of the polypeptide are derived.

The term “variability profile” refers to the cataloguing of amino acidsand their respective frequency rates of occurrence present at aparticular loop position. The loop positions are derived from an alignedfibronectin dataset. At each loop position, ranked amino acidfrequencies are added to that position's variability profile until theamino acids' combined frequencies reach a predetermined “high” thresholdvalue.

The term “amino acid” or “amino acid residue” typically refers to anamino acid having its art recognized definition such as an amino acidselected from the group consisting of: alanine (Ala, A); arginine (Arg,R); asparagine (Asn, N); aspartic acid (Asp, D); cysteine (Cys, C);glutamine (Gln, Q); glutamic acid (Glu, E); glycine (Gly, G); histidine(His, H); isoleucine (Ile, I): leucine (Leu, L); lysine (Lys, K);methionine (Met, M); phenylalanine (Phe, F); proline (Pro, P); serine(Ser, S); threonine (Thr, T); tryptophan (Trp, W); tyrosine (Tyr, Y);and valine (Val, V) although modified, synthetic, or rare amino acidsmay be used as desired.

“Chemically equivalent amino acids” refer to amino acids that havesimilar steric, charge, and solubility properties. One common schemegroups amino acids in the following way: (1) glycine, having a hydrogenside chain; (2) alanine (Ala, A), valine (Val, V), leucine (Leu, L), andisoleucine (Iso, I), having hydrogen or an unsubstituted aliphatic sidechain; (3) serine (Ser, S) and threonine (Thr, T) having an aliphaticside chain bearing a hydroxyl group; (4) aspartic (Asp, D) and glutamicacid (Glu, E), having a carboxyl containing side chain; (5) asparagine(Asn, N) and glutamine (Glu, Q), having an aliphatic side chainterminating in an amide group; (6) arginine (Arg, R) lysine (Lys, L) andhistidine (His, H), having an aliphatic side chain terminating in abasic amino group; (7) cysteine (Cys, C) and methionine (Met, M), havinga sulfur containing aliphatic side chain; (8) tyrosine (Tyr,Y) andphenylalanine (Phe, F), having an aromatic side chain; and (9)tryptophan (Trp, W), praline (Pro, P), and histidine (His, H), having aheterocyclic side chain.

The term “polynucleotide(s)” refers to nucleic acids such as DNAmolecules and RNA molecules and analogs thereof (e.g., DNA or RNAgenerated using nucleotide analogs or using nucleic acid chemistry). Asdesired, the polynucleotides may be made synthetically, e.g., usingart-recognized nucleic acid chemistry or enzymatically using, e.g., apolymerase, and, if desired, be modified. Typical modifications includemethylation, biotinylation, and other art-known modifications. Inaddition, the nucleic acid molecule can be single-stranded ordouble-stranded and, where desired, linked to a detectable moiety.Polynucleotide basis and alternative base pairs are given their usualabbreviations herein: Adenosine (A), Guanosine (G), Cytidine (C),Thymidine (T), Uridine (U), puRine (R=A/G), pyrimidine (Y═C/T or C/U),aMino (M=A/C), Keto (K=G/T or G/U), Strong (S=G/C), Weak (W=A/T or A/U),V (A or C or G, but not T), N or X, (any base).

The term “mutagenesis” refers to, unless otherwise specified, any artrecognized technique for altering a polynucleotide or polypeptidesequence. Preferred types of mutagenesis include walk-throughmutagenesis (WTM), natural-variant combinatorial mutageneis, andbeneficial natural-variant combinatorial mutagenesis, although othermutagenesis libraries may be employed, including look-throughmutagenesis (LTM), improved look-through mutagenesis (LTM2), WTM usingdoped nucleotides for achieving codon bias, extended WTM for holdingshort regions of sequence as constant or fixed within a region ofgreater diversity, or combinations thereof.

The term “beneficial natural-variant combinational library” refers to acombination library of coding sequences that encode, in two of the threeBC, DE, and FG loops of the polypeptide, beneficial mutations determinedby screening natural-variant combinatorial libraries containing sequencediversity in those two loops, and natual-variant combinatorial aminoacids in the third loop.

II. Overview of the Method and Libraries

Artificial antibody scaffolds that bind specific ligands are becominglegitimate alternatives to antibodies. Antibodies have been useful asboth diagnostic and therapeutic tools. However, obtaining specificantibodies recognizing certain ligands have been difficult. Currentantibody libraries are biased against certain antigen classes only afterimmunological exposure. Therefore it is frequently necessary to immunizea host animal with a particular antigen before recovery of specificantibodies can occur. Furthermore, these in vivo derived antibodylibraries usually do not have candidates that recognize self antigens.These are usually lost in a expressed human library because selfreactive antibodies are removed by the donor's immune system by negativeselection. Furthermore, antibodies are difficult and expensive toproduce requiring special cell fermentation reactors and purificationprocedures.

The limitations of antibodies has spurred the development of alternativebinding proteins based on immunoglobulin like folds or other proteintopologies. These non-antibody scaffold share the general quality ofhaving a structurally stable framework core that is tolerant to multiplesubstitutions in other parts of the protein.

The present invention provides a universal fibronectin binding domainlibrary that is more comprehensive and engineered to have artificialdiversity in the ligand binding loops. By creating artificial diversity,the library size can be controlled so that they can be readily screenedusing, for example, high throughput methods to obtain new therapeutics.The universal fibronectin library can be screened using positivephysical clone selection by FACS, phage panning or selective ligandretention. These in vitro screens bypass the standard and tediousmethodology inherent in generating an antibody hybridoma library andsupernatant screening.

Furthermore, the universal fibronectin library has the potential torecognize any antigen as the constituent amino acids in the binding loopare created by in vitro diversity techniques. This produces thesignificant advantages of the library controlling diversity size and thecapacity to recognize self antigens. Still further, the fibronectinbinding domain library can be propagated and re-screened to discoveradditional fibronectin binding modules against other desired targets.

IIA. Fibronectin Review: (FN)

Fibronectin Type III (FN3) proteins refer to a group of proteinscomposed of momomeric subunits having Fibronectin Type III (FN3)structure or motif made up of seven β-strands with three connectingloops. %-strands A, B, and E form one half β-sandwich and β-strands C,D,F, and G form the other half (see FIGS. 2 a and 2 b), and havingmolecular weights of about 94 amino acids and molecular weights of about10 Kda. The overall fold of the FN3 domain is closely related to that ofthe immunoglobulin domains, and the three loops near the N-terminus ofFN3, named BC, DE, and FG (as illustrated in FIG. 2 b), can beconsidered structurally analogous to the antibody variable heavy (VH)domain complementarity-determining regions, CDR1, CDR2, and CDR3,respectively. Table 1 below shows several FN3 proteins, and the numberof different FN3 modules or domains associated with each protein. Thus,fibronectin itself is composed of 16 different modules or domains whoseamino acid sequences are shown in aligned form in FIG. 5. A given moduleof an FN3 protein is identified by module number and protein name, forexample, the 14^(th) FN3 module of human fibronectin (14/FN or 14/FN3),the 10^(th) FN3 module of human fibronectin (10/FN or 10/FN3), the1^(st) FN3 module of tenascin (1/tenascin), and so forth.

TABLE 1 Representative FN3 protins and their modules FN3 Protein FN3modules Angiopoietin 1 receptor. 3 Contactin protein 4 Cytokine receptorcommon β chain 2 Down syndrome cell adhesion protein 6 DrosophilaSevenless protein 7 Erythropoietin receptor 1 Fibronectin 16 Growthhormone receptor 1 Insulin receptor 2 Insulin-like growth factor Ireceptor 3 Interferon-γ receptor β chain. 2 Interleukin-12 β chain 1Interleukin-2 receptor β chain 1 Leptin receptor (LEP-R) 3 Leukemiainhibitory factor receptor (LIF-R) 6 Leukocyte common antigen 2 Neuralcell adhesion protein L1 4 Prolactin receptor 2 Tenascin protein 15Thrombopoietin receptor. 2 Tyrosine-protein kinase receptor Tie-1 3

Fibronectin itself is involved in many cellular processes, includingtissue repair, embryogenesis, blood clotting, by serving as a generalcell adhesion molecule anchoring cells to integrin, collagen or otherproteoglycan substrates. In addition, fibronectin also can serve toorganize the extracellular matrix binding to different components, suchas heparin, to membrane-bound receptors on cell surfaces. The amino acidsequence of fibronectin reveals three types of internally homologousrepeats or modules separated by (usually) short connecting sequences.There are 12 type I, 2 type II and 16 type III modules, and referred toas FN I, FNII and FNIII respectively. Each FN module constitutes anindependently folded unit, often referred to as a domain. As notedabove, modules homologous to those in fibronectin are also found inother proteins, especially the FN3 motif which is one of the mostubiquitous of all modules, being found in extracellular receptorkinases, phosphatases, tenascin and others. Since its discovery, thisFN3 domain has been found in many animal proteins and is estimated tooccur in 2% of the proteins sequenced to date. Within fibronectinitself, there are sixteen FN3 domains and have remarkably similartertiary structures. Interestingly, while FN3 conformation are highlyconserved, the similarity between different modules of the same typewithin a given fibronectin protein is quite low typically less than 20%.In contrast, the amino acid sequence homology for the same FN-IIImodules across multiple species is notably higher, approximately80%-90%.

Fibronectin modules fold independently and thus can exist in isolationfrom their neighbors. The three dimensional structures of severalexamples of each type of fibronectin module have been determined. Asexpected from the well-known relationship between amino acid sequenceand 3D structure, modules of the same type have similar folds. All threetypes of module are composed almost exclusively of antiparallel 1 sheetsand turns, with little or no alpha helix. In F3 modules, the top sheetcontains four antiparallel beta strands and the bottom sheet isthree-stranded. Disulphide bridges do not stabilize FN3 structure.Instead, this occurs solely through hydrophobic interactions in themodule core.

IIB. Identifying and Selecting Fibronectin Scaffold and Loop ComponentsUsing Bioinformatics

The first step in building a universal fibronectin library of theinvention is selecting sequences that meet certain predeterminedcriteria. PFAM, ProSite and similar databases were searched forsequences containing FN3 domains (FIG. 1). These electronic databases(box 30 in FIG. 1) contain catalogued expressed fibronectin andfibronectin-like protein sequences and can be queried for those FN3module and similar sequences (using the BLAST search algorithm, box 32).The FN3 module sequences can then be grouped to predefined criteria suchas module subclasses, sequence similarity or originating organism(s).The framework sequence selection can also be performed for scaffoldproteins such as FN I, FN II or ankyrin and other proteins (box 34).Example 1 provides additional; details of the method.

Candidate FN3 β-strand scaffold framework sequences are then delineatedwhereupon the intervening loop regions and constituent amino acids arethen identified (box 36). This then determines the length of theexisting loops, the amino acid profiles for each loop length and, hencethe physical size and amino acid diversity that can be accommodatedwithin these frameworks. With reference to FIG. 1, once the loop areidentified, sequences within each loop are aligned, at 38, 40, and 42,and the aligned sequences are then split into groups according to looplength, as at 44. The distribution of loop lengths for the BC, DE, andFG loops are shown at FIGS. 6-8, respectively. Using this information,the most common loop sizes are selected, at 46. In a general embodimentof the invention that will be iluustrated below, the selected looplengths are BC/11, BC/14, BC15, DE/6, FG/8, and FG11, as shown at 48 forBC/11. For each β-strand, one can determine the preferred loop acceptorsites in the frameworks based on both comparative structural andsequence analysis (FIGS. 3 and 5). For example, FIG. 3 a shows astructural overlay comparison of the overall loop and β strand scaffoldsbetween the fibronectin 10/FN3 and 14/FN3. FIG. 3 b shows the structuraloverlay comparison of the FG loop and F and G β strand boundariesbetween FN3 modules 10, 13 and 14. In identifying precise looppositions, the above step greatly minimizes necessary diversity loopmutations that would not result in functional ligand bindingspecificity. Additional details can be found in Example 2.

Once loop lengths are selected, a positional amino acid frequencyanalysis is performed at each loop position, to determine the frequencyof occurrence, in a set of native FN3 modules, e.g., all human FN3modules, at box 50. This method includes a frequency analysis and thegeneration of the corresponding variability profiles (VP) of existingloop sequences (See Example 2 and FIGS. 9-14). High frequency(e.g. >50%) positions are considered conserved or fixed. Moderately highfrequency or “semi-conserved” amino acids or (when 2 or 3 are combinedaccount for >40%) are chosen as “wildtype” at other positions. Thesewildtype amino acids are then systematically altered using, mutagenesis,e.g. walk-through mutagenesis (WTM), to generate the universal looplibrary (see Example 3). “Variable” positions are those where typically,no one amino acid accounts for more than 20% of the represented set.

The choice of candidate frameworks based on the criteria of theinvention dictates both the loop sizes to be introduced and the initialamino acid sequence diversity.

A loop variability profile analysis of the FN3 databases allowsidentification of loop amino acid residue positions that fall withinthree categories, e.g., 1) positions that should be conserved or“fixed,” and 2) semi-conserved and/or 3) variable positions that aresuitable for diversity generation. A variability profile analysis isperformed and a threshold frequency is used to identify the mostfavorable sequences to be used in designating the overall loop diversity(box 52).

The conserved or a selected semi-conserved sequence (typically the mostfrequent amino acid in the semi-conserved residues) is considered the“wild type” or “cojnsensus” residue in the loop sequence. Surprisingly,this “consensus” or “frequency” approach identifies those particularamino acids under high selective pressure. Accordingly, these residuepositions are typically fixed, with diversity being introduced intoremaining amino acid positions (taking into account the identifiedpreference for certain amino acids to be present at these positions). Aswill be seen below, the threshold for occurrence frequency at whichamino acid variation will be introduced can vary between selected levelsas low as 40%, preferably 50% to as high as 100%. At the 100% thresholdfrequency, WTM amino acids can be introduced at all positions of theloop, and the only constraints on natural-variant amino acids will bethe total number of variants and whether chemical equivalents areavailable.

When designing the diversity for any of the above-mentioned loops,modified amino acid residues, for example, residues outside thetraditional 20 amino acids used in most polypeptides, e.g.,homocysteine, can be incorporated into the loops as desired. This iscarried out using art recognized techniques which typically introducestop codons into the polynucleotide where the modified amino acidresidue is desired. The technique then provides a modified tRNA linkedto the modified amino acid to be incorporated (a so-called suppressortRNA of, e.g., the stop codon amber, opal, or ochre) into thepolypeptide (see, e.g., Köhrer et al., Import of amber and ochresuppressors tRNAs into mammalian cells: A general approach tosite-specific insertion of amino acid analogues into proteins, PNAS, 98,14310-14315 (2001)).

The bioinformatic analysis focuses on FN3 modules genes for descriptivepurposes, but it will be understood that genes for other FN modules andother scaffold protein are similarly evaluated.

IIC. Computer-Assisted Universal Fibronectin Library Construction

The universal fibronectin loop libraries of the invention and theirconstruction is conducted with the benefit of sequence and structuralinformation such that the potential for generating improved fibronectinbinding domains is increased. Structural molecular replacement modelinginformation can also be used to guide the selection of amino aciddiversity to be introduced into the defined loop regions. Still further,actual results obtained with the fibronectin binding domains of theinvention can guide the selection (or exclusion), e.g., affinitymaturation, of subsequent fibronectin binding domains to be made andscreened in an iterative manner.

In the preferred embodiment, in silico modeling is used to eliminate theproduction of any fibronectin binding domains predicted to have poor orundesired structure and/or function. In this way, the number offibronectin binding domains to be produced can be sharply reducedthereby increasing signal-to-noise in subsequent screening assays. Inanother particular embodiment, the in silico modeling is continuallyupdated with additional modeling information, from any relevant source,e.g., from gene and protein sequence and three-dimensional databasesand/or results from previously tested fibronectin binding domains, sothat the in silico database becomes more precise in its predictiveability (FIG. 1).

In yet another embodiment, the in silico database is provided with theassay results, e.g., binding affinity/avidity of previously testedfibronectin binding domains and categorizes the fibronectin bindingdomains, based on the assay criterion or criteria, as responders ornonresponders, e.g., as fibronectin binding domain molecules that bindwell or not so well. In this way, the affinity maturation of theinvention can equate a range of functional responses with particularsequence and structural information and use such information to guidethe production of future fibronectin binding domains to be tested. Themethod is especially suitable for screening fibronectin binding domainsfor a particular binding affinity to a target ligand using, e.g., aBiacore assay.

Accordingly, mutagenesis of noncontiguous residues within a loop regioncan be desirable if it is known, e.g., through in silico modeling, thatcertain residues in the region will not participate in the desiredfunction. The coordinate structure and spatial interrelationship betweenthe defined regions, e.g., the functional amino acid residues in thedefined regions of the fibronectin binding domain, e.g., the diversitythat has been introduced, can be considered and modeled. Such modelingcriteria include, e.g., amino acid residue side group chemistry, atomdistances, crystallography data, etc. Accordingly, the numberfibronectin binding domains to be produced can be intelligentlyminimized.

In a preferred embodiment, one or more of the above steps arecomputer-assisted. In a particular embodiment, the computer assistedstep comprises, e.g., mining the NCBI, Genbank, PFAM, and ProSitedatabases and, optionally, cross-referencing the results against PDBstructural database, whereby certain criteria of the invention aredetermined and used to design the desired loop diversity (FIG. 1). Themethod is also amenable to being carried out, in part or in whole, by adevice, e.g., a computer driven device. For example, database miningfibronectin module sequence selection, diversity design, oligonucleotidesynthesis, PCR-mediated assembly of the foregoing, and expression andselection of candidate fibronectin binding domains that bind a giventarget, can be carried out in part or entirely, by interlaced devices.In addition, instructions for carrying out the method, in part or inwhole, can be conferred to a medium suitable for use in an electronicdevice for carrying out the instructions. In sum, the methods of theinvention are amendable to a high throughput approach comprisingsoftware (e.g., computer-readable instructions) and hardware (e.g.,computers, robotics, and chips).

IID. Universal Walk-Through Mutagenesis Libraries

In one general aspect, the present invention includes a walk-throughmutagenesis (WTM) library of fibronectin Type 3 domain polypeptidesuseful in screening for the presence of one or more polypeptides havinga selected binding or enzymatic activity. The library polypeptidesinclude (a) regions A, AB, B, C, CD, D, E, EF, F, and G having wildtypeamino acid sequences of a selected native fibronectin Type 3 polypeptideor polypeptides, and (b) loop regions BC, DE, and FG having one or moreselected lengths. At least one selected loop region of a selected lengthcontains a library of WTM sequences encoded by a library of codingsequences that encode, at each loop position, a conserved or selectedsemi-conserved consensus amino acid and, if the consensus amino acid hasa occurrence frequency equal to or less than a selected thresholdfrequency of at least 50%, a single common target amino acid and anyco-produced amino acids (amino acids produced by the coding sequences ata given position as a result of codon degeneracy).

In constructing a WTM library within a given loop of a given looplength, the variability profile is used to define a sequence of fixedand “variable” positions, i.e., positions at which a target WTM aminoacid can be introduced. The number of fixed positions (no substitutionsmade) will depend on the selected threshold frequency for the consensusamino acid at each position. Example 3 illustrates the design of WTMloop sequences for each of nine representative target amino acids forFG/11. Conserved or semi-conserved residues were observed for N79, G79,G/S 81 and S84 (two consensus amino acids were placed at position 81,reflecting the high frequency of both G (60%) and S (31%). These fourpositions (and the two terminal positions S and K) were fixed, and theWTM target amino acid was introduced at each of the other positions inthe sequence. For the target amino acid lysine (K), the example showsthe various WTM sequenes containing a K at from one up to allsubstitution positions in the loop. Table 5 is the example shows thesubstitution sequences for all six. BC/11, BC/14, BC/15, DE/6, FG/8, andFG11 loops selected.

In an alternative embodiment, the threshold for consensus sequences isset at 100%, so that all residue positions will be selected forintroduction of a WTM target amino acid. This approach in illustrated inExample 5.

Once the WTM loop sequences are selected, a library of coding-sequenceoligonucleotides encoding all of the identified WTM sequences isconstructed, making codon substitutions as shown that are effective topreserve the existing consensus amino acid, but also encode the selectedtarget amino acid, and any other co-product amino acids encoded bydegenerate codons, as detailed in Examples 4 and 5 for the two differentWTM libraries (with and without consensus threshold constraints). FIGS.15A-15I show the base fixed sequence and variable positions of a BC looplength size 11 and the amino acid matrix showing wild type, the WTMtarget positions and the extra potential diversity generated from thedegenerate WTM codons for each of the selected amino acids K (15A), Q(15B), D (15C), Y (15D), L (15E), P (15F), S (15G), H (15H), and G(15I). FIGS. 16A-16I, 17A-17I, 18A-18I provide similar tables for BCloop length 15 (BC/15), DE loop length 6 (DE/6), and FG loop length 11(FG/11), respectiviely.

The library of coding sequences for the WTM loops is added to theframework sequences, as detailed in the section below and in Example 6,to construct the library of coding sequences for the WTM polypeptidelibraries. In one preferred embodiment, the coding library includescoding sequences for each of the six different loop and loop lengths,and for each of the nine selected “representative” WTM target aminoacids (see below).

The library of polypeptides may be encoded by an expression libraryformat that includes a ribosome display library, a polysome displaylibrary, a phage display library, a bacterial expression library, or ayeast display library.

The libraries may be used in a method of identifying a polypeptidehaving a desired binding affinity, in which the natural-variantcombinatorial library are screened to select for an fibronectin bindingdomain having a desired binding affinity. The efficiency of a 14/FN WTMlibrary constructed in accordance with the present invention, forselecting FN3 polypeptides having a high binding affinity for a selectedantigen, TNFα, can be appreciated from Example 8, with respect to FIGS.24A-24C.

IIE. Universal Natural-Variant combinatorial Mutagenesis Libraries

In another general aspect, the invention includes a natural-variantcombinatorial library of fibronectin Type 3 domain polypeptides usefulin screening for the presence of one or more polypeptides having aselected binding or enzymatic activity. The library polypeptides include(a) regions A, AB, B, C, CD, D, E, EF, F, and G having wildtype aminoacid sequences of a selected native fibronectin Type 3 polypeptide orpolypeptides, and (b) loop regions BC, DE, and FG having selectedlengths. At least one selected loop region of a selected length containsa library of natural-variant combinatorial sequences expressed by alibrary of coding sequences that encode at each loop position, aconserved or selected semi-conserved consensus amino acid and, if theconsensus amino acid has a frequency of occurrence equal to or less thana selected threshold frequency of at least 50%, other natural variantamino acids, including semi-conserved amino acids and variable aminoacids whose occurrence rate is above a selected minimum thresholdoccurrence at that position, or their chemical equivalents.

In constructing a natural-variant combinatorial library for a given loopand loop length, the variability profile is used to define a sequence offixed and “variable” positions, i.e., positions at which amino acidvariations can be introduced. As in the WTM libraries, the number offixed positions (no substitutions made) will depend on the selectedthreshold frequency for the consensus amino acid at each position. If,for example, the selected frequency threshold was set at about 60%, theconserved or semi-conserved residues for FG/11 are N79, G79, G/S 81 andS84 (from Example 3) and natural-variant substitutions would not be madeat these positions. Conversely, if the threshold frequency is set at100%, all positions would be considered open to variation, recognizingthat a single amino acid with a frequency of 100% at a loop positionwould not be substituted, and a position that had one very dominantamino acid, e.g., with a frequency of 90%, might be substituted only ifthe low-frequency variant(s) were chemically dissimilar to the dominantamino acid.

From the amino acid profile for a given loop and loop length, andknowing which of the positions will be held fixed and which will beadmit variations, the amino acid substitutions at each variable positioncan be selected. In general, the number of variations that are selected(including co-produced amiono acids) will depend on the number ofvariable substitution positions in the loop and the average number ofvariations per substituted loop position. Ideally, the number ofvariations selected will be such as to maintain the diversity of theloop in the range 10⁵-10⁷ Isequences, allowing a library of twovariable-sequence loops in the range of about 10¹². As will be seen fromExample 9 below, an FG/11 loop having amino acid variations at 10 of the11 positions, will have an average of about 4 amino acidvariations/position, whereas a shorter loop, such as DE/6 will admit upto 6 or more variations per position. Of course, if natural-variantsubstitutions are introduced into a single loop only, many morevariations per position can be accommodated.

The particular natural variant amino acids that are selected for eachposition will generally include the amino acids having the highestfrequencies, while limited the number of co-produced amino acids, andsecondarily, preserving chemical diversity at each site. Thus, if thecodon change for one variant amino acid would produce severalco-produced amino acids, that variant would likely be omitted, and/or achemically equivalent amino acid would be sought. Similarly, if onenatural variant is chemically equivalent to another one, one of the twocould be omitted. In summary, the natural-variant loop sequences areconstructed to include the highest frequency natural variants, whileminimizing co-produced amino acids and minimizing redundancy in aminoacid side chain properties, and to limit the total diversity of the loopand loop length if necessary, i.e., where sequence variation isintroduced into more than one loop. The application of these rules canbe seen in the exemplary variant substitutions in a BC/11 loop (FIG.25B) from the BC/11 loop profile given in FIG. 9. Similarly, theapplication of the rules can be seen in the exemplary variantsubstitutions in a DE/6 loop (FIG. 25B), from the DE loop profile inFIG. 12.

Once the natural-variant loop sequences are selected, a library ofcoding-sequence oligonucleotides encoding all of the identifiednatural-variant sequences is constructed, making codon substitutionsthat are effective to preserve the existing consensus amino acid, andencode the selected variant amino acids, including variants encodedencoded by degenerate codons.

The library of coding sequences for the natural-variants loops is addedto the framework sequences, as detailed in the section below and inExample 6, to construct the library of coding sequences for thenatural-variant polypeptide libraries. In one preferred embodiment, thecoding library includes coding sequences for a pair of BC/DE, BC/FG orDE/FG loops, where each loop in the pair has one selected length, egg.,BC/11 and DE/6. Such a two-loop library is described in Examples 9 and10 below. After selecting high-affinity binding (or enzymatic)polypeptides from this library, a second “beneficial” library can beconstructed that includes the beneficial mutations contained in one orboth of original two-loop natural-variation library, and natural-variantamino acids in the third loop, i.e., the previously fixed-sequence loop.

Natural-variant combinatorial library may have the following sequencesfor the indicated loops and loop lengths: (a) BC loop length of 11, andthe amino acid sequence identified by SEQ ID NOS: 43 or 49; (b) BC looplength of 14, and the amino acid sequence identified by SEQ ID. NOS: 44or 50; (c) BC loop length of 15, and the amino acid sequence identifiedby SEQ ID. NOS: 45 or 51; (d) DE loop length of 6, and the amino acidsequence identified by SEQ ID. NOS: 46 or 52; (e) FG loop length of 8,and the amino acid sequence identified by SEQ ID. NOS: 47, for the firstN-terminal six amino acids, or SEQ ID NO:53, and (f) FG loop length of11, and the amino acid sequence identified by SEQ ID. NO: 48, for thefirst N-terminal nine amino acids, or SEQ ID NO:54.

The library of polypeptides may be encoded by an expression library thathas the format of a ribosome display library, a polysome displaylibrary, a phage display library, a bacterial expression library, or ayeast display library.

The libraries may be used in a method of identifying a polypeptidehaving a desired binding affinity, in which the natural-variantcombinatorial library are screened to select for an fibronectin bindingdomain having a desired binding affinity. The efficiency of a 14/FNnatural-variant combinatorial library constructed in accordance with thepresent invention, for selecting FN3 polypeptides having a high bindingaffibity for each of two selected antigens, VEGF and HMGB1, can beappreciated from Example 9, with respect to FIGS. 25A-25C, and Example10, with respect to FIGS. 26A-26C.

IIF. Synthesizing Universal Fibronectin Binding Domain Libraries

In one embodiment, the universal fibronectin binding domains of theinvention are generated for screening by synthesizing individualoligonucleotides that encode the defined region of the polypeptide andhave no more than one codon for the predetermined amino acid. This isaccomplished by incorporating, at each codon position within theoligonucleotide either the codon required for synthesis of the wild-typepolypeptide or a codon for the predetermined amino acid and is referredto as look-through mutagenesis (LTM) (see, e.g., U.S. Patent PublicationNo. 20050136428).

In another embodiment, when diversity at multiple amino acid positionsis required, walk-through mutagenesis (WTM) can be used (see e.g., U.S.Pat. Nos. 6,649,340; 5,830,650; and 5,798,208; and U.S. PatentPublication No. 20050136428. WTM allows for multiple mutations to bemade with a minimum number of oligonucleotides. The oligonucleotides canbe produced individually, in batches, using, e.g., doping techniques,and then mixed or pooled as desired.

The mixture of oligonucleotides for generation of the library can besynthesized readily by known methods for DNA synthesis. The preferredmethod involves use of solid phase beta-cyanoethyl phosphoramiditechemistry (e.g., see U.S. Pat. No. 4,725,677). For convenience, aninstrument for automated DNA synthesis can be used containing specifiedreagent vessels of nucleotides. The polynucleotides may also besynthesized to contain restriction sites or primer hybridization sitesto facilitate the introduction or assembly of the polynucleotidesrepresenting, e.g., a defined region, into a larger gene context.

The synthesized polynucleotides can be inserted into a larger genecontext, e.g., a single scaffold domain using standard geneticengineering techniques. For example, the polynucleotides can be made tocontain flanking recognition sites for restriction enzymes (e.g., seeU.S. Pat. No. 4,888,286). The recognition sites can be designed tocorrespond to recognition sites that either exist naturally or areintroduced in the gene proximate to the DNA encoding the region. Afterconversion into double stranded form, the polynucleotides are ligatedinto the gene or gene vector by standard techniques. By means of anappropriate vector (including, e.g., phage vectors, plasmids) the genescan be introduced into a cell-free extract, phage, prokaryotic cell, oreukaryotic cell suitable for expression of the fibronectin bindingdomain molecules.

Alternatively, partially overlapping polynucleotides, typically about20-60 nucleotides in length, are designed. The internal polynucleotidesare then annealed to their complementary partner to give adouble-stranded DNA molecule with single-stranded extensions useful forfurther annealing. The annealed pairs can then be mixed together,extended, and ligated to form full-length double-stranded moleculesusing SOE-PCR (see, e.g., Example 3). Convenient restriction sites canbe designed near the ends of the synthetic gene for cloning into asuitable vector. The full-length molecules can then be ligated into asuitable vector.

When partially overlapping polynucleotides are used in the geneassembly, a set of degenerate nucleotides can also be directlyincorporated in place of one of the polynucleotides. The appropriatecomplementary strand is synthesized during the extension reaction from apartially complementary polynucleotide from the other strand byenzymatic extension with a polymerase. Incorporation of the degeneratepolynucleotides at the stage of synthesis also simplifies cloning wheremore than one domain or defined region of a gene is mutagenized orengineered to have diversity.

In another approach, the fibronectin binding domain is present on asingle stranded plasmid. For example, the gene can be cloned into aphage vector or a vector with a filamentous phage origin of replicationthat allows propagation of single-stranded molecules with the use of ahelper phage. The single-stranded template can be annealed with a set ofdegenerate polynucleotides representing the desired mutations andelongated and ligated, thus incorporating each analog strand into apopulation of molecules that can be introduced into an appropriate host(see, e.g., Sayers, J. R. et al., Nucleic Acids Res. 16: 791-802(1988)). This approach can circumvent multiple cloning steps wheremultiple domains are selected for mutagenesis.

Polymerase chain reaction (PCR) methodology can also be used toincorporate polynucleotides into a gene, for example, loop diversityinto β-strand framework regions. For example, the polynucleotidesthemselves can be used as primers for extension. In this approach,polynucleotides encoding the mutagenic cassettes corresponding to thedefined region (or portion thereof) are complementary to each other, atleast in part, and can be extended to form a large gene cassette (e.g.,a fibronectin binding domain) using a polymerase, e.g., using PCRamplification.

The size of the library will vary depending upon the loop length and theamount of sequence diversity which needs to be represented using, e.g.,WTM or LTM. Preferably, the library will be designed to contain lessthan 10¹⁵, 10¹⁴, 10¹³, 10¹², 10¹¹, 10¹⁰, 10⁹, 10⁸, 10⁷, and morepreferably, 10⁶ fibronectin binding domain.

The description above has centered on representing fibronectin bindingdomain diversity by altering the polynucleotide that encodes thecorresponding polypeptide. It is understood, however, that the scope ofthe invention also encompasses methods of representing the fibronectinbinding domain diversity disclosed herein by direct synthesis of thedesired polypeptide regions using protein chemistry. In carrying outthis approach, the resultant polypeptides still incorporate the featuresof the invention except that the use of a polynucleotide intermediatecan be eliminated.

For the libraries described above, whether in the form ofpolynucleotides and/or corresponding polypeptides, it is understood thatthe libraries may be also attached to a solid support, such as amicrochip, and preferably arrayed, using art recognized techniques.

The method of this invention is especially useful for modifyingcandidate fibronectin binding domain molecules by way of affinitymaturation. Alterations can be introduced into the loops and/or into theβ-strand framework (constant) region of an fibronectin binding domain.Modification of the loop regions can produce fibronectin binding domainswith better ligand binding properties, and, if desired, catalyticproperties. Modification of the β-strand framework region can also leadto the improvement of chemo-physical properties, such as solubility orstability, which are especially useful, for example, in commercialproduction, bioavailabilty, and affinity for the ligand. Typically, themutagenesis will target the loop region(s) of the fibronectin bindingdomain, i.e., the structure responsible for ligand-binding activitywhich can be made up of the three loop regions. In a preferredembodiment, an identified candidate binding molecule is subjected toaffinity maturation to increase the affinity/avidity of the bindingmolecule to a target ligand.

IIG. Expression and Screening Systems

Libraries of polynucleotides generated by any of the above techniques orother suitable techniques can be expressed and screened to identifyfibronectin binding domain molecules having desired structure and/oractivity. Expression of the fibronectin binding domain molecules can becarried out using cell-free extracts (and e.g., ribosome display), phagedisplay, prokaryotic cells, or eukaryotic cells (e.g., yeast display).

In one embodiment, the polynucleotides are engineered to serve astemplates that can be expressed in a cell free extract. Vectors andextracts as described, for example in U.S. Pat. Nos. 5,324,637;5,492,817; 5,665,563, can be used and many are commercially available.Ribosome display and other cell-free techniques for linking apolynucleotide (i.e., a genotype) to a polypeptide (i.e., a phenotype)can be used, e.g., Profusion™ (see, e.g., U.S. Pat. Nos. 6,348,315;6,261,804; 6,258,558; and 6,214,553).

Alternatively, the polynucleotides of the invention can be expressed ina convenient E. coli expression system, such as that described byPluckthun and Skerra. (Pluckthun, A. and Skerra, A., Meth. Enzymol. 178:476-515 (1989); Skerra, A. et al., Biotechnology 9: 273-278 (1991)). Themutant proteins can be expressed for secretion in the medium and/or inthe cytoplasm of the bacteria, as described by M. Better and A. Horwitz,Meth. Enzymol. 178: 476 (1989). In one embodiment, the fibronectinbinding domain are attached to the 3′ end of a sequence encoding asignal sequence, such as the ompA, phoA or pelB signal sequence (Lei, S.P. et al., J. Bacteriol. 169: 4379 (1987)). These gene fusions areassembled in a dicistronic construct, so that they can be expressed froma single vector, and secreted into the periplasmic space of E. coliwhere they will refold and can be recovered in active form. (Skerra, A.et al., Biotechnology 9: 273-278 (1991)).

In another embodiment, the fibronectin binding domain sequences areexpressed on the membrane surface of a prokaryote, e.g., E. coli, usinga secretion signal and lipidation moiety as described, e.g., inUS20040072740A1; US20030100023A1; and US20030036092A1.

In still another embodiment, the polynucleotides can be expressed ineukaryotic cells such as yeast using, for example, yeast display asdescribed, e.g., in U.S. Pat. Nos. 6,423,538; 6,331,391; and 6,300,065.In this approach, the fibronectin binding domain molecules of thelibrary are fused to a polypeptide that is expressed and displayed onthe surface of the yeast.

Higher eukaryotic cells for expression of the fibronectin binding domainmolecules of the invention can also be used, such as mammalian cells,for example myeloma cells (e.g., NS/0 cells), hybridoma cells, orChinese hamster ovary (CHO) cells. Typically, the fibronectin bindingdomain molecules when expressed in mammalian cells are designed to beexpressed into the culture medium, or expressed on the surface of such acell. The fibronectin binding domain can be produced, for example, assingle individual module or as multimeric chains comprising dimers,trimers, that can be composed of the same module or of different moduletypes. (¹⁰FN3-¹⁰FN3: homodimer, ¹⁰FN³-⁵FN3: heterodimer)

The screening of the expressed fibronectin binding domain (orfibronectin binding domain produced by direct synthesis) can be done byany appropriate means. For example, binding activity can be evaluated bystandard immunoassay and/or affinity chromatography. Screening of thefibronectin binding domain of the invention for catalytic function,e.g., proteolytic function can be accomplished using a standardhemoglobin plaque assay as described, for example, in U.S. Pat. No.5,798,208. Determining the ability of candidate fibronectin bindingdomain to bind therapeutic targets can be assayed in vitro using, e.g.,a Biacore instrument, which measures binding rates of a fibronectinbinding domain to a given target or ligand. In vivo assays can beconducted using any of a number of animal models and then subsequentlytested, as appropriate, in humans.

IIH. Analysis and Screening of FN3 WTM Libraries for Catalytic Function.

FN3 WTM libraries can also be used to screen for FN3 proteins thatpossess catalytic activity. The study of proteins has revealed thatcertain amino acids play a crucial role in their structure and function.For example, it appears that only a discrete number of amino acidsparticipate in the catalytic event of an enzyme. Serine proteases are afamily of enzymes present in virtually all organisms, which have evolveda structurally similar catalytic site characterized by the combinedpresence of serine, histidine and aspartic acid. These amino acids forma catalytic triad which, possibly along with other determinants,stabilizes the transition state of the substrate. The functional role ofthis catalytic triad has been confirmed by individual and by multiplesubstitutions of serine, histidine and aspartic acid by site-directedmutagenesis of serine proteases and the importance of the interplaybetween these amino acid residues in catalysis is now well established.These same three amino acids are involved in the enzymatic mechanism ofcertain lipases as well. FIG. 26 is a schematic depiction of a“walk-through” mutagenesis of a an active site that utilizes the serine,histidine and aspartic acid triad.

Similarly, a large number of other types of enzymes are characterized bythe peculiar conformation of their catalytic site and the presence ofcertain kinds of amino acid residues in the site that are primarilyresponsible for the catalytic event. For an extensive review, see EnzymeStructure and Mechanism, 1985, by A. Fersht, Freeman Ed., New York.

Though it is clear that certain amino acids are critical to themechanism of catalysis, it is difficult, if not impossible, to predictwhich position (or positions) an amino acid must occupy to produce afunctional site such as a catalytic site. Unfortunately, the complexspatial configuration of amino acid side chains in proteins and theinterrelationship of different side chains in the catalytic pocket ofenzymes are insufficiently understood to allow for such predictions.Selective site-directed mutagenesis and saturation mutagenesis are oflimited utility for the study of protein structure and function in viewof the enormous number of possible variations in complex proteins.

Protein libraries generated by any of the above WTM/CBM/LTM techniquesor other suitable techniques can be screened to identify variants ofdesired structure or activity.

By comparing the properties of a wild-type protein and the variantsgenerated, it is possible to identify individual amino acids or domainsof amino acids that confer binding and/or catalytic activity. Usually,the region studied will be a functional domain of the protein such as abinding domain. For example, the region can be the external BC, DE andFG loop binding regions of FN3 domain. The screening can be done by anyappropriate means. For example, catalytic activity can be ascertained bysuitable assays for substrate conversion and binding activity can beevaluated by standard immunoassay and/or affinity chromatography.

From the chemical properties of the side chains, it appears that only aselected number of natural amino acids preferentially participate in acatalytic event. These amino acids belong to the group of polar andneutral amino acids such as Ser, Thr, Asn, Gln, Tyr, and Cys, the groupof charged amino acids, Asp and Glu, Lys and Arg, and especially theamino acid His. Typical polar and neutral side chains are those of Cys,Ser, Thr, Asn, Gln and Tyr. Gly is also considered to be a borderlinemember of this group. Ser and Thr play an important role in forminghydrogen-bonds. Thr has an additional asymmetry at the beta carbon,therefore only one of the stereoisomers is used. The acid amide Gln andAsn can also form hydrogen bonds, the amido groups functioning ashydrogen donors and the carbonyl groups functioning as acceptors. Glnhas one more CH₂ group than Asn which renders the polar group moreflexible and reduces its interaction with the main chain. Tyr has a verypolar hydroxyl group (phenolic OH) that can dissociate at high pHvalues. Tyr behaves somewhat like a charged side chain; its hydrogenbonds are rather strong.

Histidine (His) has a heterocyclic aromatic side chain with a pK valueof 6.0. In the physiological pH range, its imidazole ring can be eitheruncharged or charged, after taking up a hydrogen ion from the solution.Since these two states are readily available, His is quite suitable forcatalyzing chemical reactions. It is found in most of the active centersof enzymes.

Asp and Glu are negatively charged at physiological pH. Because of theirshort side chain, the carboxyl group of Asp is rather rigid with respectto the main chain. This may be the reason why the carboxyl group in manycatalytic sites is provided by Asp and not by Glu. Charged acids aregenerally found at the surface of a protein.

Therefore, several different regions or loops of a FN3 protein domaincan be mutagenized simultaneously. The same or a different amino acidcan be “walked-through” each loop region. This enables the evaluation ofamino acid substitutions in conformationally related regions such as theregions which, upon folding of the protein, are associated to make up afunctional site such as the catalytic site of an enzyme or the bindingsite of an antibody. This method provides a way to create modified orcompletely new catalytic sites. As depicted in FIG. 26, the three loopregions of FN3, which can be engineered to confer target ligand binding,can be mutagenized simultaneously, or separately within the BC, DE andFG loops to assay for contributing catalytic functions at this bindingsite. Therefore, the introduction of additional “catalyticallyimportant” amino acids into a ligand binding region of a protein mayresult in de novo catalytic activity toward the same target ligand.

Hence, new structures can be built on the natural “scaffold” of anexisting protein by mutating only relevant regions by the method of thisinvention. The method of this invention is suited to the design of denovo catalytic binding proteins as compared to the isolation ofnaturally occurring catalytic antibodies. Presently, catalyticantibodies can be prepared by an adaptation of standard somatic cellfusion techniques. In this process, an animal is immunized with anantigen that resembles the transition state of the desired substrate toinduce production of an antibody that binds the transition state andcatalyzes the reaction. Antibody-producing cells are harvested from theanimal and fused with an immortalizing cell to produce hybrid cells.These cells are then screened for secretion of an antibody thatcatalyzes the reaction. This process is dependent upon the availabilityof analogues of the transition state of a substrate. The process may belimited because such analogues are likely to be difficult to identify orsynthesize in most cases.

The method of this invention can be used to produce many differentenzymes or catalytic antibodies, including oxidoreductases,transferases, hydrolases, lyases, isomerases and ligases. Among theseclasses, of particular importance will be the production of improvedproteases, carbohydrases, lipases, dioxygenases and peroxidases. Theseand other enzymes that can be prepared by the method of this inventionhave important commercial applications for enzymatic conversions inhealth care, cosmetics, foods, brewing, detergents, environment (e.g.,wastewater treatment), agriculture, tanning, textiles, and otherchemical processes. These include, but are not limited to, diagnosticand therapeutic applications, conversions of fats, carbohydrates andprotein, degradation of organic pollutants and synthesis of chemicals.For example, therapeutically effective proteases with fibrinolyticactivity, or activity against viral structures necessary forinfectivity, such as viral coat proteins, could be engineered. Suchproteases could be useful anti-thrombotic agents or anti-viral agentsagainst viruses such as AIDS, rhinoviruses, influenza, or hepatitis. Inthe case of oxygenases (e.g., dioxygenases), a class of enzymesrequiring a co-factor for oxidation of aromatic rings and other doublebonds, industrial applications in biopulping processes, conversion ofbiomass into fuels or other chemicals, conversion of waste watercontaminants, bioprocessing of coal, and detoxification of hazardousorganic compounds are possible applications of novel proteins.

Throughout the examples, the following materials and methods were usedunless otherwise stated.

Materials and Methods

In general, the practice of the present invention employs, unlessotherwise indicated, conventional techniques of chemistry, molecularbiology, recombinant DNA technology, PCR technology, immunology(especially, e.g., antibody technology), expression systems (e.g.,cell-free expression, phage display, ribosome display, and Profusion™),and any necessary cell culture that are within the skill of the art andare explained in the literature. See, e.g., Sambrook, Fritsch andManiatis, Molecular Cloning: Cold Spring Harbor Laboratory Press (1989);DNA Cloning, Vols. 1 and 2, (D.N. Glover, Ed. 1985); OligonucleotideSynthesis (M.J. Gait, Ed. 1984); PCR Handbook Current Protocols inNucleic Acid Chemistry, Beaucage, Ed. John Wiley & Sons (1999) (Editor);Oxford Handbook of Nucleic Acid Structure, Neidle, Ed., Oxford UnivPress (1999); PCR Protocols: A Guide to Methods and Applications, Inniset al., Academic Press (1990); PCR Essential Techniques: EssentialTechniques, Burke, Ed., John Wiley & Son Ltd (1996); The PCR Technique:RT-PCR, Siebert, Ed., Eaton Pub. Co. (1998); Current Protocols inMolecular Biology, eds. Ausubel et al., John Wiley & Sons (1992);Large-Scale Mammalian Cell Culture Technology, Lubiniecki, A., Ed.,Marcel Dekker, Pub., (1990). Phage Display: A Laboratory Manual, C.Barbas (Ed.), CSHL Press, (2001); Antibody Phage Display, P O'Brien(Ed.), Humana Press (2001); Border et al., Yeast surface display forscreening combinatorial polypeptide libraries, Nature Biotechnology,15(6):553-7 (1997); Border et al., Yeast surface display for directedevolution of protein expression, affinity, and stability, MethodsEnzymol., 328:430-44 (2000); ribosome display as described by Pluckthunet al. in U.S. Pat. No. 6,348,315, and Profusion™ as described bySzostak et al. in U.S. Pat. Nos. 6,258,558; 6,261,804; and 6,214,553,and bacterial periplasmic expression as described in US20040058403A1.

Further details regarding fibronectin and Fn3 sequence classification,identification, and analysis may be found, e.g., in SEQHUNT. A programto screen aligned nucleotide and amino acid sequences, Methods Mol.Biol. 1995; 51:1-15. and Wu et al. Clustering of highly homologoussequences to reduce the size of large protein databases. Bioinformatics.2001 March; 17(3):282-3; Databases and search and analysis programsinclude the PFAM database at the Sanger Institute (pfam.sanger.ac.uk);the ExPASy PROSITE database (www.expasv.ch/prosite/); SBASE web(hydra.icgeb.trieste.it/sbase/); BLAST (www.ncbi.nlm.nih.gov/BLAST/);CD-HIT (bioinformatics.ljcrf.edu/cd-hi/); EMBOSS(www.hqmp.mrc.ac.uk/Software/EMBOSS/); PHYLIP(evolution.genetics.washington.edu/phylip.html); andFASTA(fasta.bioch.virginia.edu).

Briefly, a microbial expression and display system is used which has ademonstrated reliability for expressing fibronectin binding domainlibraries. Typically, the fibronectin binding domain is joined togetherby a linker peptide to another surface molecule creating a fusionprotein. A variety of methods are available for library expression anddisplay including ribosome, phage, E. coli, and yeast surface display.All combine genotype-phenotype linkage to allow selection of noveltarget binding clones.

Yeast:

The fibronectin binding domain library (ie. FN3) is transfected into therecipient bacterial/yeast hosts using standard techniques. Yeast canreadily accommodate library sizes up to 10⁷, with 10³-10⁵ copies of eachFNII fusion protein being displayed on each cell surface. Yeast cellsare easily screened and separated using flow cytometry andfluorescence-activated cell sorting (FACS) or magnetic beads. The yeasteukaryotic secretion system and glycosylation pathways of yeast alsoallows FN3 type molecules to be displayed with N and O linked sugars onthe cell surface.

The yeast display system utilizes the a-agglutinin yeast adhesionreceptor to display proteins on the cell surface. The proteins ofinterest, in this case, FN3 WTM, LTM and CBM libraries, are expressed asfusion partners with the Aga2 protein.

These fusion proteins are secreted from the cell and become disulfidelinked to the Aga1 protein, which is attached to the yeast cell wall(see Invitrogen, pYD1 Yeast Display product literature). The plasmidpYD1, prepared from an E. coli host by plasmid purification (Qiagen), isdigested with the restriction enzymes, Bam HI and Not I, terminallydephosphorylated with calf intestinal alkaline phosphatase. Ligation ofthe pYD1 vector and the above SOE-PCR products WTM libraries (alsodigested by BamHI and NotI), E. coli (DH5a) transformation and selectionon LB-ampicillin (50 mg/ml) plates were performed using standardmolecular biology protocols to amplify the WTM libraries beforeelectroporation into yeast cell hosts.

Methods for selecting expressed FN3 library variants havingsubstantially higher affinities for target ligands (TNF, VEGF, VEGF-Retc), relative to the reference wild type FN3 domain, will now bedescribed.

Candidate test ligands (TNF, VEGF, VEGF-R etc), are fluorescentlylabeled (either directly or indirectly via a biotin—streptavidin linkageas described above). Those library clones that efficiently bind thelabeled antigens are then enriched for by using FACS. This population ofyeast cells is then re-grown and subjected to subsequent rounds ofselection using increased levels of stringency to isolate a smallersubset of clones that recognize the target with higher specificity andaffinity. The libraries are readily amenable to high-throughput formats,using, e.g., FITC labeled anti-Myc-tag FN3 binding domain molecules andFACS analysis for quick identification and confirmation. In addition,there are carboxyl terminal tags included which can be utilized tomonitor expression levels and/or normalize binding affinitymeasurements.

To check for the display of the Aga2-FN3 fusion protein, an aliquot ofyeast cells (8×10⁵ cells in 40 μl) from the culture medium iscentrifuged for 5 minutes at 2300 rpm. The supernatant is aspirated andthe cell pellet is washed with 200 μl of ice cold PBS/BSA buffer(PBS/BSA 0.5% w/v). The cells are re-pelleted and supernatant removedbefore re-suspending in 100 μl of buffer containing the biotinylatedTNFα (200 nM). The cells were left to bind the TNFa at 20° C. for 45minutes after which they were washed twice with PBS/BSA buffer beforethe addition and incubation with streptavidin-FITC (2 mg/L) for 30minutes on ice. Another round of washing in buffer was performed beforefinal re-suspension volume of 400 μl in PBS/BSA. The cells were thenanalyzed on FACSscan (Becton Dickinson) using CellQuest software as permanufacturers directions.

Kinetic selections of the yeast displayed TNF-α fibronectin bindingdomain libraries involve initial labeling of cells with biotinylatedTNF-α ligand followed by time dependent chase in the presence of largeexcess of un-biotinylated TNF-α ligand. Clones with slower dissociationkinetics are identified by steptavidin-PE labeling after the chaseperiod and sorted using a high speed FACS sorter. After Aga2-FN3induction, the cells are incubated with biotinylated TNF-α at saturatingconcentrations (400 nM) for 3 hours at 25 C under shaking. After washingthe cells, a 40 hour cold chase using unlabelled TNF-a (1 uM) at 25° C.was performed. The cells were then washed twice with PBS/BSA buffer,labeled with Streptavidin PE (2 mg/ml) anti-HIS-FITC (25 nM) for 30minutes on ice, washed and re-suspended and then analyzed on FACS ARIAsorter.

Ribosome Display:

Ribosome display utilizes cell free in vitro coupledtranscription/translation machinery to produce protein libraries. TheFN3 library genes are inserted upstream to kappa light immunoglobulingene that does not have a termination stop codon causing the ribosome tostall, but not release, when it reaches the end of the mRNA.Additionally, the kappa domain spacer serves to physically distance theFN3 protein from the ribosome complex so that FN3 binding domain hasbetter accessibility to recognize its cognate ligand. The mRNA libraryis introduced into either S30 E. coli ribosome extract preparations(Roche) or rabbit reticulate lysate (Promega). In either case, the 5′end of the nascent mRNA can bind to ribosomes and undergo translation.During translation, the ligand-binding protein remains non-covalentlyattached to the ribosome along with its mRNA progenitor in amacromolecular complex {He, 2005 #58; He, 1997 #59; He, 2007 #57}.

The functional FN3 proteins can then bind to a specific ligand that iseither attached to magnetic beads or microtiter well surface. During theenrichment process, non-specific variants are washed away before thespecific FN3 binders are eluted. The bound mRNA is detected by RT-PCRusing primers specific to the 5′ FN3 and 3′ portion of the kappa generespectively (FIG. 9). The amplified double stranded cDNA is then clonedinto an expression vector for sequence analysis and protein production.

Prokaryotic translation reactions contained 0.2 M potassium glutamate,6.9 mM magnesium acetate, 90 mg/ml protein disulfide isomerase (Fluka),50 mM Tris acetate (pH 7.5), 0.35 mM each amino acid, 2 mM ATP, 0.5 mMGTP, 1 mM cAMP, 30 mM acetyl phosphate, 0.5 mg/ml E. coli tRNA, 20 mg/mlfolinic acid, 1.5% PEG 8000, 40 ml S30 E. coli extract and 10 mg mRNA ina total volume of 110 ml. Translation was performed at 37 C for 7 min,after which ribosome complexes were stabilized by 5-fold dilution inice-cold selection buffer [50 mM Tris acetate (pH 7.5), 150 mM NaCl, 50mM magnesium acetate, 0.1% Tween 20, 2.5 mg/ml heparin].

For Eukaryotic Ribosome Display we Used the Flexi Rabbit ReticulocyteLysate

System (Promega). Eukaryotic translation reactions contained 40 mM KCl,100 mg/ml protein disulfide isomerase (Fluka), 0.02 mM each amino acid,66 ml rabbit reticulocyte lysate and 10 mg mRNA in a total volume of 100ml. Translation was performed at 30 C for 20 min, after which ribosomecomplexes were stabilized by 2-fold dilution in ice-cold PBS.

Affinity Selection for Target Ligands.

Stabilized ribosome complexes were incubated with biotinylated hapten[50 nM fluorescein-biotin (Sigma)] or antigen [100 nM IL-13 (Peprotech)biotinylated in-house] as appropriate at 4 C for 1-2 h, followed bycapture on streptavidin-coated M280 magnetic beads (Dynal). Beads werethen washed to remove non-specifically bound ribosome complexes. Forprokaryotic selections, five washes in ice-cold selection buffer wereperformed. For eukaryotic selections, three washes in PBS containing0.1% BSA and 5 mM magnesium acetate were performed, followed by a singlewash in PBS alone. Eukaryotic complexes were then incubated with 10 UDNAse I in 40 mM Tris-HCl, 6 mM MgCl2, 10 mMNaCl, 10 mM CaCl2 for 25 minat 37 C, followed by three further washes with PBS, 5 mM magnesiumacetate, 1% Tween 20.

Recovery of mRNA from Selected Ribosome Complexes

For analysis of mRNA recovery without a specific disruption step,ribosome complexes bound to magnetic beads were directly processed intothe reverse transcription reaction. For recovery of mRNA fromprokaryotic selections by ribosome complex disruption, selectedcomplexes were incubated in EB20 [50 mM Tris acetate (pH 7.5), 150 mMNaCl, 20 mM EDTA, 10 mg/ml Saccharomyces cerevisae RNA] for 10 min at 4C. To evaluate the efficiency of the 20 mM EDTA for recovery of mRNAfrom eukaryotic selections, ribosome complexes were incubated in PBS20(PBS, 20 mM EDTA, 10 mg/ml S. cerevisae RNA) for 10 min at 4 C. mRNA waspurified using a commercial kit (High Pure RNA Isolation Kit, Roche).For prokaryotic samples, the DNAse I digestion option of the kit wasperformed; however, this step was not required for eukaryotic samples,as DNAse I digestion was performed during post-selection washes. Reversetranscription was performed on either 4 ml of purified RNA or 4 ml ofimmobilized, selected ribosome complexes (i.e. a bead suspension).

For prokaryotic samples, reactions contained 50 mM Tris-HCl (pH 8.3), 75mM KCl, 3 mMMgCl2, 10 mMDTT, 1.25 primer, 0.5 mM PCR nucleotide mix(Amersham Pharmacia), 1 URNAsin (Promega) and 5 U SuperScript II(Invitrogen) and were performed by incubation at 50 C for 30 min. Foreukaryotic samples, reactions contained 50 mM Tris-HCl (pH 8.3), 50 mMKCl, 10 mM MgCl2, 0.5 mM spermine, 10 mM DTT, 1.25 mM RT primers, 0.5 mMPCR nucleotide mix, 1 U RNasin and 5 U AMV reverse transcriptase(Promega) and were performed by incubation at 48 C for 45 min.

PCR of Selection Outputs

End-point PCR was performed to visualize amplification of thefull-length construct. A 5 ml sample of each reverse transcriptionreaction was amplified with 2.5 UTaq polymerase (Roche) in 20 mMTris-HCl (pH 8.4), 50 mM KCl, 1 mM MgCl2, 5% DMSO, containing 0.25 mMPCR nucleotide mix, 0.25 mM forward primer (T7B or T7KOZ for prokaryoticand or eukaryotic experiments, respectively) and 0.25 mM RT primer.Thermal cycling comprised 94 C for 3 min, then 94 C for 30 s, 50 C for30 s and 72 C for 1.5 min for 30 cycles, with a final step at 72 C for 5min. PCR products were visualized by electrophoresis on an ethidiumbromide stained agarose gels. The isolated PCR products can then besub-cloned into a bacterial pBAD expression vector for soluble proteinproduction (below).

Bacterial Expression and Production:

Competent E. coli host cells are prepared as per manufacturer'sinstructions (Invitrogen PBAD expression system). Briefly, 40 μl LMG 194competent cells and 0.5 μl pBAD FN3 constructs (approximately 1 μg DNA)is incubated together on ice for 15 minutes after which, a one minute42° C. heat shock was applied. The cells are then allowed to recover for10 minutes at 37° C. in SOC media before plating onto LB-Amp plates and37° C. growth overnight. Single colonies are picked the next day forsmall scale liquid cultures to initially determine optimal L-arabinoseinduction concentrations for FN3 production. Replicates of each cloneafter reaching an OD₆₀₀=0.5 were test induced with serial (1:10)titrations of L-arabinose (0.2% to 0.00002% final concentration) afterovernight growth at room temperature. Test cultures (1 ml) arecollected, pelleted and 100 μl 1×BBS buffer (10 mM, 160 mM NaCl, 200 mMBoric acid, pH=8.0) added to resuspend the cells before the addition of50 μl of lysozyme solution for 1 hour (37° C.). Cell supernatants fromthe lysozyme digestions are collected after centrifugation, and MgSO₄was added to final concentration 40 mM. This solution was applied to PBSpre-equilibrated Ni-NTA columns. His-tagged bound FN3 samples are twicewashed with PBS buffer upon which elution was accomplished with theaddition of 250 mM imidazole. Purity of the soluble FN3 expression isthen examined by SDS-PAGE.

Larger scale E. coli cell culture, 100 ml, pellets are collected bycentrifugation after overnight growth at 25° C. The pellets are thenre-suspended in PBS buffer (0.1% tween) and subjected to 5 rounds ofrepeated sonication (Virtis Ultrasonic cell Disrupter) to lyse thebacterial cell membrane and release the cytoplasmic contents. Thesuspension is first clarified by high speed centrifugation to collectthe supernatant for further processing. This supernatant is then appliedto PBS pre-equilibrated Ni-NTA columns. His-tagged bound FN3 samples aretwice washed with PBS buffer upon which elution is accomplished with theaddition of 250 mM imidazole. The pH of the supernatant is then adjustedto 5.5 with 6M HCl and before loading onto a SP Sepharose HP cationexchange column (Pharmacia). The FN3 was eluted a salt (NaCl) gradientand fraction concentrations containing the FN3 were determined byoptical density at 280 nm and verified by PAGE. Fractions containingFN3s are then pooled and dialyzed with PBS.

Octet Kinetic Analysis:

Binding affinities (KD=k_(d)/k_(a)=k_(off)/k_(on)) of the FN3 variantsare calculated from the resultant association (k_(a)=k_(on)) anddissociation (k_(d)=k_(off)) rate constants as measured using a OCTETbiolayer inferometry system (ForteBio, Inc). The ligand, e.g., TNF-α, isimmobilized on the OCTET streptavidin capillary sensor tip surface and,in effect, allows monitoring of the monomeric FN3 kinetic binding. OCTETstreptavidin tips are activated according to manufacturer's instructionsusing 50 uM TNF-a. A solution of PBS/BSA is also introduced as ablocking agent.

For association kinetic measurements, FN3 variants are introduced at aconcentration of 10 ug/ml. Dissociation is observed in PBS bufferwithout the agents. The kinetic parameters of the binding reactions weredetermined using ForteBio software. FIG. 25 displays OCTET results fromthe reference D2E7 anti-TNF-a antibodies and 9-14 TNF FN3 clone.

Candidate clones are then isolated and plasmid preparations areperformed to obtain fibronectin binding domain sequence information. Theapproach allows for a hypothesis-driven rational replacement of codonsnecessary to determine and optimize amino acid functionality in the loopof the fibronectin binding domain. Comparative sequence analysis andindividual clone affinity/specificity profiles then determine whichclones undergo affinity maturation.

Example 1 Methods for Bioinformatic-Guided Identification of UniversalFibronectin Binding Domain Library Sequences

In this example, universal BC, DE, and FG loop sequences for fibronectinbinding domain library sequences are identified and selected usingbioinformatics and the criteria of the invention. A generalizedschematic of this process is presented in FIG. 1.

Briefly, the PFAM database was searched for a multiple sequencealignment containing only sequences belonging to the Fibronectin TypeIII family (FN3, PFAM ID: PF00041).

This search returned an initial dataset of 9321 protein sequences. It isnoted, however, that this set of sequences can increase in number asadditional sequences are cloned and entered into the database. The basedataset of ˜9321 FN3 superfamily sequences included FN3 sequences frommultiple sources: human fibronectin, tenascin, drosophila sevenless,protein tyrosine phosphatase, EPH tyrosine kinases, and cytokinereceptors and Z proteins (see Table 1). It is appreciated that genesequences related to FN1, FN2 and other protein scaffold frameworkdomains can also be searched in these databases and identified insimilar manner.

TABLE 2 Distribution of analyzed proteins in the Fibronectin module IIIsuperfamily. Base dataset: All FN module III like proteins 9321 Firstround filtered non redundant mammalian 5955 FN module III Non redundanthuman FN module III 794

Within the starting FN3 sequences “base dataset”, the compiled loopsequences are likely to possess vastly different characteristics. Thesequences will vary with respect to: originating species, originatingmodules, loop lengths, and other qualities. Due to originating sequencedisparity among the “base dataset” members, trying to derive a coherentanalysis may require further refinement from the starting “base dataset”such that subset datasets share one or more of elected properties ofinterest. Therefore, from the initial ˜9321 base dataset, redundant orduplicatively entered sequences were removed leaving 5955 filterednon-redundant FN3 sequences. From this second subset, another set ofsequences were further parsed into a third subset made of 794 human FN3sequences. Constituent subset members sharing those respectiveproperties can produce a more “standardized” set of sequences formeaningful comparative and unskewed data analysis within the subgroup.This process can be iterated, resulting in the generation of smallerdatasets with a higher degree of predefined relationships. Wideningsequence variability may be achieved, for example, by including thedatasets containing non-human modules.

The next step involved the designation of the β-scaffold framework andloop lengths and loop amino acid sequences, followed by a frequencyanalysis of these candidate loop sequences. Therefore, determination ofvariability profiles entails collection and selection of aligned aminoacid sequences that shares one or more defined properties of interest tocreate a dataset. We developed a β-strand and loop positionalclassification system using the positional numbering architecture of the10/FN3 domain as a reference point. β-strands were first identifiedbased on the crystal structure of FN3 fibronectin-protein modules 1, 4,7, 8, 9, 10, 12, 13, 14, and 15. We also examined their dihedral anglesand hydrogen bonding patterns to assist with the assignment ofβ-strands. Amino acid sequence alignments were then performed to furtheraid in the determination of β-strands and loops. For example in Table 3,using ¹⁰FN3 as a reference sequence, the following amino acid positionsfor the of β-strands and loops were designated as follows:

TABLE 3 β-strands and loop amino acid positions of FN3 β-strands LoopsA: 8-14 AB: 15-16 B: 17-20 BC: 21-31 C: 32-39 CD: 40-44 D: 45-50 DE:51-56 E: 57-59 EF: 60-67 F: 68-75 FG: 76-86 G: 87-94

For example, the aligned analysis indicated that the tryptophan atposition 22 (W22) is conserved between the different modules. Based on acrystallographic structural comparisons, W22 is located near thejunction of the BC loop acceptor site and we chose position 21 as theloop starting point. There is a well conserved group of valine andisoleucine amino acids at position 20 between the FN3 domains (V20 in¹⁰FN3) and appears to be part of the B 13-strand. Immediately followingW22, we found that there are structural differences between theoverlayed loops before the BC loops reenter the C β-strand at awell-conserved Y/F32 found between the members. By designating position21 and 31 as the outer boundaries of the BC loop, we then segregated the“filtered dataset” into BC loop sizes and determined their amino acidfrequency compositions. Similarly, this sequence/structure comparativeanalysis was applied to the DE and FG loops. We designated the DE loopas bordered by a conserved group of valine, leucine and isoleucine atposition 50 (V50 in ¹⁰FN3) of the D β-strand and ending at position 56(T56 in ¹⁰FN3) of the E β-strand. Likewise, we choose positions 76 (T76in ¹⁰FN3) and 86 (K86 in ¹⁰FN3) as the F β-strand and G β-strandjunctions for the FG loop.

Based on these aligned β strand boundary definitions, the loops werefound not be of one size, but to occur in differing lengths. Thefrequency distribution of the BC, DE and FG loop sizes were analyzed andtabulated (FIGS. 6-8). The loop size classification and descriptionfollow the nomenclature of: FN3 LOOP/LENGTH. For example BC/15 refers toFN3 BC loop length of 15 amino acids. In the BC loop definition, thescaffold loop boundaries were originally set for an eleven amino acidlength. However, the BC loop ranges from nine to eighteen amino acids inlength, with length fourteen (BC/14), and fifteen (BC/15) also beingquite common (FIG. 6). Alignments of the loop amino acids can beperformed between the differing loop lengths. For example, the prolineat position 25 (P25) in BC/15 was found quite conserved withinconstituent BC/15 members. P25 is also observed to be quite prevalent inBC/14 and BC/11 members. The loop amino acid positions were then alignedto best approximate these equivalent positions. Hence in BC/15 the extracomprising loop amino acids given the designation 26 a, 26 b, 26 c, and26 d, as they are located between residues 26 and 27 of the BC/15 loop.In loop BC/14 the extra comprising loop amino acids are designated from26 a, 26 b, and 26 c, as they are between residues 26 and 27 of theBC/14 loop. As stated above, numbering is based upon 10/FN3 (the 10^(th)FN3 module of human fibronectin) as a reference.

Example 2 Assessing Loop Variability Profiles Using BioinformaticsThrough Filtering and Cluster Analysis of Gene Sequences

The universal fibronectin binding domain libraries were designed bydetermining the variability profiles for the loops expressed in vivo.The variability profiles represent the cataloging of the different aminoacids, and their respective rates of occurrence, present at a particularposition in a dataset of aligned sequences. Size related families ofloop sequences using the parameters set forth above within this starting“base dataset” can be identified and delineated. Comparative analysis ofthese multiple aligned loops provide variability profile information asto the existing and “tolerated” diversity for introducing amino acidchanges that can lead to potential ligand binding. The designation ofloops and their comprising amino acids can also be described for otherscaffold like proteins using similar definitions.

The frequency distribution of the six loop sizes, shown in FIGS. 6 to 8,were generated to determine if there was preferred BC, DE and FG loopsizes for FN3 sequences (Table 2). For the BC loop (FIG. 6), a fifteenamino acid loop size was the most common accounting for 30% of the BCloop population. BC loop sizes 14 and 11 were the next common sizesoccurring at 18% and 16% of the BC loop population. BC loop sizes 11, 14and 15 were then chosen for our variability profile analysis.

Frequency size analysis of the DE loop demonstrated that loop size 6occurred in nearly 45% of the analyzed FN3 sequences (FIG. 7). The otherDE loop sizes 4, 5 and 7 accounted for no more than 16% each. DE loopsize 6 accounted for 55% of the members suggesting FN3 modules have moreof a preference for DE loop size 6. In this case, only DE loop size 6was chosen for further variability profile analysis. Frequency sizeanalysis of the FG loop demonstrated that loop size 8 and 11 were thetwo most common loop sizes accounting for 58% and 18% of the analyzeddataset respectively. Both FG loop sizes 8 and 11 were thereforeincluded for further variability profile analysis. In an earlier versionof the analysis, the FG loop lengths FG/8 and FG/11 were identified asFG/6 and FG/9, respectively, the difference being the addition of twoC-terminal amino acids S, K to each FG loop length, as seen in Table 5below). In some examples, where an FG/8 loop is identified by six aminoacids, it is understood that these amino acids represent the sixN-terminal amino acids of FG/8. Similarly, where an FG/11 loop isidentified by nine amino acids, it is understood that these amino acidsrepresent the nine N-terminal amino acids of FG/11.

TABLE 4 Lengths of FN3 BC, DE, and FG loops analyzed along with thenumber of sequences indicated number of LOOP length sequences BC 11 88514 1066 15 1732 DE 6 2729 FG 8 1057 11 3340

For each of the 6 selected loops (Table 4) a separate frequency analysiswas executed to determine positional amino acid usage in the context ofthe selected loop. A simple frequency analysis using EMBOSS/prophecy(bioinfo.nhri.org.tw/cgi-bin/emboss/prophecy) was executed generating amatrix representing the positional amino acid usage. The output matrixwas then parsed and filtered in order to have relative frequency datafor each position. The parser provides a very simple filter based on twothresholds (low and high). For each position the parser processes onlyamino acids with relative frequency above the “low” threshold until thecumulative frequency reaches the “high” threshold. If the high thresholdis not reached, then the parser also evaluates the amino acids withrelative frequency below the low threshold. A good low-high thresholdcombination was 5-50 because it provides good sensitivity for positionclassification. The parser therefore limits the listing of only the mostfrequent, and not all, amino acids that occur at each position. Theparser output is visualized as frequency charts and the results areshown in FIGS. 9-14.

Example 2A Identifying Fixed and Non-Fixed Loop Positions withThresholds

In one embodiment, a natural-variant combinatorial library with aconserved or selected semi-conserved consensus amino acid is designed asfollows.

FN3 loop datasets are enumerated as above for amino acid variability andtheir relative frequencies at each aligned position (FIGS. 9-14). Theabove analysis identified positional preferences in all FN3 module loopsand are termed “variability profiles.” For example, in BC loop size 11(FIG. 9), a tryptophan (W) at position 22 is found at nearly 95% of allFN3 loop positions demonstrating high degree of selective pressure forits presence. Position 22 would be considered “conserved” for tryptophan(W22) (see below Example 3) as it occurred above a predetermined 40%threshold level and was more than twice as common as the next mostfrequent amino acid at that position. This “fixed” residue is seen asthe dominant amino acid with respect to the other amino acids that occurat that loop position. A “fixed” position may not be subject tomutagenic diversification in first round library building. At other BCloop size 11 loop positions, it was evident that position 25 favoredprolines (P) (approximately >45%) as it occurred twice as often comparedto the next most frequent amino acids (<20%) in that same position.Hence, P25 position is considered “fixed” as the predominant amino acidwas two fold more frequent than the next. In contrast, prolines werealso the most favored amino acid at positions 24, 26 and 28, but thefrequency of the other amino acids were also fairly abundant (8-19%). Aspositions 24, 26 and 28 have a wide range of populating amino acids withsimilar frequency of occurrences, and the predominant amino acid was notmore than two fold more frequent, these positions would be considered“variable.”

In BC loop size 14 (FIG. 10), the positional prevalence of particularamino acids was pronounced. In BC/14, the W position 22 occurs at (>95%)while position 25 has proline occurring at 70%. This indicated that W22and P25 are heavily favored in BC/14 loops so that both positions areconsidered “fixed.” Also fixed was isoleucine at position 29. Here inBC/14, isoleucine (I29) occurred at >52%, more than two-fold more thanthe next two most common amino acids, valine and leucine at 18% and 16%respectively.

For BC/15 loops (FIG. 11), W22, P25 and 129 were also found to be fixedpositions similar to BC/11 and BC/14. Additionally, glycines (G) atpositions 26e and 26f occur at frequencies >75% and can be alsoconsidered “fixed.” By first determining these “fixed” positions andnon-fixed positions (indicated by “X”), B/15 would have a starting“fixed” sequence of: X21 W22 X23 X24 P25 X26 X26a D26b G26c G26d X27 X28I29 X30 X31. Similarly, BC/4 and BC/1 would have starting “fixed”sequences of X21 W22 X23 X24 P25 X26 X26a X26b X26c X27 X28 I29 X30 X31and X21 W22 X23 X24 P25 X26 X26 X27 X28 I29 X30 X31, respectively.

The same variability profile analysis was performed for all the otherloops. Briefly, for DE loop size 6 (FIG. 12), there were no looppositions where a single amino acid accounted for more than 35% of therepresentative positional amino acids. There was a slight preference forglycine at position 52 (52G), threonine at position 55 (55T) and serineat position 56 (56S). In this case however, there were no predominantacids above a 40% threshold level to be considered “fixed” amino acidsand all loop positions are considered “variable.” For libraryconstruction, the most common amino acid in each loop position serves asthe starting amino acid for first round mutagenesis. The DE/looptherefore has an all “variable” starting loop sequence of: X51 X52 X53X54 X55 X56.

For FG loop size 8 (FIG. 13), the serine (S) at position 84 was “fixed”as it occurred in 75% of the populating sequences. All other remainingFG loop size 8 positions were “variable” using the most frequentlyoccurring amino acid yielding a FG/8 starting “fixed” sequence of X76X77 X78 X79 X80 84S (the N-terminal six amino acids of FG/8). The FG/8loop numbering jumps from 80 to 84 as the longer loop size nine is usedas the reference size length based on 10/FN3. FG/11 contained four“fixed” amino acids with asparagine (56N), glycine (79G), serine (84S)and a “semi-fixed” glycine and serine at position 81. The three aminoacids, 56N, 79G, 84S and semi-fixed 81 G/S occurred at greater than 40%frequency in the populating sequences. This analysis yielded a FG/11starting “fixed” loop sequence of N76 X77 X78 79G X80 G/S81 X82 X83 S84(the N-terminal nine amino acids of FG/11).

The variability profile for each loop dataset then identifies thedesired characteristics of a given loop position for furtherintroduction of diversity representation. These above resultsdemonstrate that the diversity of loop amino acids introduced into alibrary can be “fine-tuned” depending on the threshold level offrequency of occurrence. These “fixed” loop positions attempt toreplicate some of the natural diversity to promote possible structuralstabilization effects.

This “fixing” of positions also has the effect of “narrowing” thediversity of variable positions in starting loop sequences. However,there can be the occasion to perform the reverse, that is, to obtainlarger more diverse libraries. In this case, the “widening” effect isaccomplished by raising the threshold of frequency of occurrence used todesignate “fixed” amino acids. In this way, the variability profile willcapture fewer of the most conserved loop positions and classify them as“fixed” positions. The remaining loop positions would be part of thebroader “variable” amino acids that can be diversified.

Example 2B Identifying Fixed and Non-Fixed Loop Positions withoutThresholds

In another embodiment it is possible to design natural-variationcombinatorial diversity in each of the six selected loops (Table 4)without defining a variability threshold, i.e., where the selectedthreshold is 100%. In this embodiment each mutated loop is designed tocontain amino acids that mimics its variability profile in terms of bothvariability and chemistry characteristics. At each specific loopposition, oligonucleotide synthesis is optimized to contain a degeneratecodon that would match/mimic the chemistry and the variability at thatposition. Positions having two or more amino acids in their variabilityprofiles will be mutated regardless of the degree of variability of thatposition.

In BC loop sizes 11 (FIG. 24) and 14 only the W at position 22 will bekept fixed resulting in the following mutagenesis pattern: BC/11=X21 W22X23 X24 X25 X26 X27 X28 X29 X30 X31; BC/14=X21 W22 X23 X24 X25 X26 X26aX26b X26c X27 X28 X29 X30 X31. In BC loop size 15 there are 3 fixedpositions (W22, P25 and G26c) resulting in the following mutagenesispattern: BC/15=X21 W22 X23 X24 P25 X26 X26a X26b G26c X26d X27 X28 X29X30 X31).

In DE loop size 6 all the positions are variable and therefore will havea starting sequence of DE_(—)6=X51 X52 X53 X54 X55 X56.

The FG loop size 8 will be encoded by the starting sequence of FG/8=X76X77 X78 X79 X80 X84 S85 S86. Finally FG loop size 11 will be encoded asFG/11=X76×77×78 G79 X80 X81 X82 X83 X84 S85 S86.

Each of the positions denoted as X will encode for the amino acidspresent in the related variability profile or for a subset of chemicallyequivalent ones. In fact in some cases two or more amino acids presentat a certain position are chemically very similar. In such situations itis possible to include in the mutagenesis design only a subset of theamino acids and still preserve the natural chemistry characteristics ofthat position. This will both reduce the total number of mutants andgive more flexibility for the optimization of the oligonucleotidesynthesis.

Example 3

Methods for Designing WTM Loop Diversity for FN3 Domain Libraries withThreshold Constraints

In this example, methods for optimizing the loop diversity of an FN3binding domain library are presented. The choice of candidateframeworks, as previously noted, dictates both the loop sizes to beintroduced and the initial amino acid sequence selection. The method isillustrated particularly for the FG/11 loop.

To design the FG/11 loop library for FN3 based scaffolds, thevariability profile considerations are as follows: As stated above, a“fixed” amino acid residue is determined to occur with a thresholdfrequency that is typically at least 40% (typically at least 50%) and istwice fold more frequent than the next most frequent amino acid for agiven loop position. Upon inspection of the FN3 FG/11 variabilityprofile (FIG. 14), it can be seen that the Gly at position 79 (G79) wasfound to occur at nearly 85%. The next most frequently occurring aminoacids, was at frequency of less than 5% and did not register on theminimum threshold value (cut off by parser). When a given residue, inthis case G79, is determined to occur at such a high frequency, it ishighly conserved and thus represented in the libraries of the inventionas “fixed,” meaning that it will not be mutated in the first roundlibrary diversification. Similarly, N76, G81 and S84 are also “highlyconserved” as they occur at a frequency rate of 66%, 47%, and 61%respectively and are “fixed” in the first diversity library. In somecases, it can be that there are two semi-conserved amino acid residue ata given loop position. For example, G81 and S81 are found to occur atnearly 66% and 30% respectively. Hence, in this case, both amino acidsare considered “semi-fixed” at position 81 as it appears that there isstrong selective pressure to maintain G81 and S81 and no other aminoacids. The 66% G81 and 30% S81 ratios can be maintained by controlled“doping” of the incoming nucleotide mixture (see EXAMPLE 5). The numberof WTM substitutions per loop can be readily achieved by oligonucleotidesynthesis doping (see, e.g., US20040033569A1 for technical details).

Positions N76, G79, G/S81, and S84 are fixed candidate clones so thatFG/11 has a starting “fixed” loop sequence of N76-77-78 79G-80G/S81-82-83 S84 (underline indicating the fixed amino acid). The reasonfor not creating diversity at all sites is to restrict the initialdiversity size of the library to facilitate efficient expression anddisplay of all variants. These initially “fixed” positions indicatestrong selective pressures for their preservation. However, they arestill sites for look-through mutatgenesis (LTM) mutagenesis duringaffinity maturation. In other words, the initial “fixed” positions canbe later mutated. The overall goal in “fixing” positions in the firstround library diversification is two fold: 1) to maximize the number offunctional clones while by incorporating most of the preferred loopresidues and 2) to minimize the total library size. Subsequent LTMmapping of retrieved clones then precisely determines whether the“fixed” site is a “hot” or “cold” spot that can be mutated for furtheraffinity maturation without losing overall binding function (see U.S.Patent Publication No. 20050136428).

The term “variable” amino acid residue refers to amino acid residuesdetermined to occur with a lower frequency (less than the high thresholdvalue of 20%) for a given residue position. Upon inspection of the FG/11variability profile for example (FIG. 14), it can be seen that variablepositions 77, 78, 80, 82 and 83 each have no single prevalent amino acidthat occurs at higher than 40% frequency. For FG/11 positions 77, 78,80, 82 and 83 each loop position had many different amino acids withoccurring at a fairly low level frequency. Accordingly, in positions 77,78, 80, 82 and 83 are sites for creation of initial loop sequencediversity by mutagenesis, e.g., WTM. In introducing subsequent aminoacid diversity, the FG/11 sequence would then be N76 X77 X78 79G X80G/S81 X82 X83 S84, where X denotes the variable position for initialmutagenesis (e.g., WTM) is conducted. The X position will be consist ofa starting “wild type” amino acid, the WTM replacement or a “co-product”amino acid depending on the WTM designed codon changes.

For example, to construct a WTM library within the FG/11 loop usinglysine (K) as the WTM target amino acid into the starting sequence N76X77 X78 79G X80 G/S81 X82 X83 S84, lysine residues can be substituted inpositions 77, 78, 80, 82, and 83. The FG/11 WTM(K): library variantswill have resulting sequences that are comprised of single replacements:

N76 K77 X78 79G X80 G/S81 X82 X83 S84,

N76 X77 K78 79G X80 G/S81 X82 X83 S84,

N76 X77 X78 79G K80 G/S81 X82 X83 S84,

N76 X77 X78 79G X80 G/S81 K82 X83 S84, and

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

The FG/11: WTM(K): library variant sequences are additionally comprisedof double replacements:

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

The FG/11: WTM(K): library variant sequences could be additionallycomprised of triple replacements for example:

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

The FG/11: WTM(K): library variant sequences could be additionallycomprised of quadruple replacements:

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

N76 X77 X78 79G X80 G/S81 X82 K83 S84

Finally, the FG/11: WTM(K): library variant sequences could beadditionally comprised of quintuple replacements:

N76 X77 X78 79G X80 G/S81 X82 K83 S84,

For WTM™ diversity, each “X” position in the above example must have astarting “wild type” amino acid in order to perform the targeted codonchanges. From the above variability profile analysis, FG/11 has a “wildtype” consensus sequence of N76 A77 A78 79G V80 G/S81 P82 P83 S84. Theresidues A77, A78, P82, and P83 are chosen as starting “wildtype” aminoacid as the variability profile (FIG. 14) indicated that these were themost frequent amino acids at their respective positions.

In this instance, the WTM (K) effects of introducing an amine side chainare explored in ligand binding. It is appreciated that the WTM (K) aminoacids will occur only in those target “variable” positions and not inthe “fixed” positions. The generated WTM diversity will include any“co-product” amino acids based on the degeneracy of the “wildtype” aminoacid of the target codon. For example at position 77, the “wildtype”amino acid is alanine (Ala). To perform lysine WTM at position 77 sothat both Ala and Lys are present, the degenerate codon used is RMG. The“coproduct” amino acids resulting from RMG will be threonine (T) andglutamate (E). The remaining WTM amino acids (G, S, H, L, P, Y, D, andQ) are then similarly replaced into the X variable positions of FN3FG_(—)9 loop N76 X77 X78 79G X80 G/S81 X82 X83 S84 sequence. It isappreciated that the WTM amino acids will occur along with their“co-product” amino acids.

WTM for FG/11 using the nine pre-chosen WTM™ amino acids produces alibrary diversity of approximately 2×9⁵ or 59049 (which includes WTMco-products). For comparative purposes, saturation mutagenesis of fiveFG/11 “variable” positions with all twenty amino acids would be 20⁵ or3.2×10⁶. Accordingly, performing saturation mutagenesis at all of theFG/11 loop positions would create a library diversity of 20⁹ or 5×10¹¹which is just at the limit of current library display and screeningtechnology. However, it is important to note that is for an FN3 looplibrary with just one mutated loop, and if the other FN3 loops (e.g.BC/11 and DE/6) were diversified by saturation mutagenesis, the minimumFN3 library would then be 20²⁶ or 6.7×10³³. In contrast, the minimum andmaximum FN3 library based on identifying the “variable” BC, DE and FGloop positions are 6.7×10³³. This illustrates an advantage of theinvention which, by contrast, allows for a smaller but morerepresentative library to be constructed. Indeed, the methods of theinvention provide for a manageable library for some loop positions inorder to identify the first generation of binding molecules. Subsequentaffinity maturation mutagenesis in the other CDR positions thenoptimizes those identified binding molecules.

In another approach, byproducts can be avoided by employing look-throughmutagenesis (LTM) which typically requires the synthesis of anoligonucleotide for each desired change but eliminates any by-productsby degenerate codons.

A summary of identified loop sequences for use in the FN3 binding domainlibrary of the invention is set forth below in Table 5. The names of theloops, sizes, and residue positions are standardized as previouslydiscussed. Single-letter positions are fixed positions; two-letterspositions are semi-fixed positions where the synthesis is performed witha mix in order to have only 2 targeted amino acids (example: S-G); andtwo-letters at positions where the first is ‘X’ are variable positionsfor WTM diversity. The amino acid following the X is the most frequentamino acid based on that variability profile and considered to be “wildtype” (example: X-V).

TABLE 5 In some example, where an FG/8 loop is identified by six aminoacids, it is understood that these amino acids represent the sixN-terminal amino acids of FG/8. BC LOOP 21 22 23 24 25 26 27 28 30 30 31BC_11 X-S W X-T X-P P X-P X-G X-P X-V X-D X-G 21 22 23 24 25 26 26a 26b26c 27 28 29 30 31 BC_14 X-S W X-K X-P P X-D X-D X-P X-N X-G X-P I X-TX-G 21 22 23 24 25 26 26a 26b 26c 26d 27 28 29 30 31 BC_15 X-S W X-E X-PP X-E X-D D G G X-S X-P I X-T X-G DE LOOP 51 52 53 54 55 56 DE_6 X-P X-GX-T X-E X-T X-S FG LOOP 76 77 78 79 80 84 FG_6 X X X X X S FG_8 76 77 7879 80 84 85 86 FG_8 X-N X-G X-G X-G X-E S S K 76 77 78 79 80 81 82 83 84FG_9 N X X G X G/S X X S FG_11 76 77 78 79 80 81 82 83 84 85 86 FG_11 NX-A X-A G X-V G/S X-P X-P S S K

Example 4 Oligonucleotide Design for Introducing Loop Diversity UsingWTM and Extended WTM

To design the BC/1 5 diversity (Table 5), a variability profile of theBC/15 loop positions from the filtered dataset was also performed usinga threshold frequency of 50% (FIG. 11). The variability profileindicates that BC/15 would have a “wild type” starting sequence of:X-S21 W22 X-E23 X-P24 P25 X-E26 X-D26a D26b G26c G26d X-S27 X-P28 I29X-T30 X-G31.

21 22 23 24 25 26 26a 26b 26c 26d 27 28 29 30 31 BC/15 X-S W X-E X-P PX-E X-D D G G X-S X-P I X-T X-G

To perform WTM using Lys (K), one would introduce selective degeneratecodons to place both the starting “wild type” amino acid and the WTM (K)amino acid in the selected “variable” position. The informationpresented here is also illustrated in FIGS. 16 a-16 i.

WTM:K 21 22 23 24 25 26 26a 26b 26c 26d 27 28 29 30 31 BC/15 K-S W K-EK-P P K-E K-D D G G K-S K-P I K-T K-G WTM:K arm tgg raa mma ccg raa rawGat ggc ggc arm mma att ama rra

The WTM (K) oligonucleotide sequence for BC/15 would then be:5′(arm)(tgg)(raa)(mma)(ccg)(raa)(raw)(gat)(ggc)(ggc)(arm)(mma)(att)(ama)(rra)-3′(using degenerate base codes) (SEQ ID NO: 82), or 5′-(a a/ga/c)(tgg)(a/g a a)(a/c a/c a)(ccg)(a/g a a)(a/g a a/t)(gat)(ggc)(ggc)(aa/g a/c)(a/c a/c a)(att)(a a/c a)(a/g a/g a)-3′ using standard basecodes.

The oligonucleotide to be used in the subsequent PCR reaction willcontain both 5′ and 3′ flanking regions as shown here.

(SEQ ID NO:193) ACCACCATCACAATTARMTGGRAAMMACCGRAARAWGATGGCGGCARMMMAATTAMARRATTCCAAGTCGACGCA

The underlined portion encodes for the varied BC loop as describedabove. The 5′ flanking sequence encodes the C-terminal portion of the Bβ strand domain and the 3′ flanking sequence encodes the N-terminalportion of the C β strand domain.

For position 21, (arm) or (a a/g a/c): agc encodes S (wildtype) and aaaencodes K (WTM target). Additionally, aga encodes R and aac encodes Nboth of which are “co-products” from the WTM mixed base reaction.

For position 22, (tgg) encodes W.

For position 23, (raa) or (a/g a a) encodes E or K.

For position 24, (mma) or (a/c a/c a) encodes T, K, Q or P.

For position 25, (ccg) encodes P.

For position 26, (raa) or (a/g a a) encodes E or K.

For position 26a, (raw) or (a/g a a/t) encodes N, E, K or D.

For position 26b, (gat) encodes D.

For position 26c, (ggc) encodes G.

For position 26d, (ggc) encodes G.

For position 27, or arm or (a a/g a/c) encodes R, N, K or S.

For position 28, or mma (a/c a/c a) encodes T, K, Q or P.

For position 29, or (att) encodes I.

For position 30, or ama (a a/c a) encodes T or K.

For position 31, or rra (a/g a/g a) encodes R, E, G or K.

The WTM “co-products” contribute to extra library diversity. Infollowing this exercise through for the remaining BC_(—)15 loop codons,we will generate a limited WTM (K) loop library of 8192 possiblecombination variants. We then continue applying WTM for the remainingWTM amino acids: G, S, H, L, P, Y, E, and Q (see Table 5). Overall, incompleting the nine WTM amino acid series, a total BC_(—)15 loop librarysize of 10⁵.

TABLE 6 WTM oligonucleotide design for BC/15 BC_15 21 22 23 24 25 26 26a26b 26c 26d 27 28 29 30 31 SEQ X-S W X-E X-P P X-E X-D D G G X-S X-P IX-T X-G ID NO: WTM:K arm tgg raa mma ccg raa raw gat ggc ggc arm mma attama rra 82 WTM:G rgc tgg gra ssc ccg gra grc Gat ggc ggc rgc ssc att rscggc 83 WTM:D kmt tgg gaw smt ccg gaw gat gat ggc ggc kmt smt att rmc grt84 WTM:Q mrs tgg sag cmg ccg sag sak gat ggc ggc mrs cmg att mmg srg 85WTM:S agc tgg rrm yct ccg rrm rrc gat ggc ggc agc yct att asc rgc 86WTM:H ymt tgg saw cmt ccg saw sat gat ggc ggc ymt cmt att mmc srt 87WTM:Y tmt tgg tat ymt ccg tat kat gat ggc ggc tmt ymt att wmc krt 88 gaagaa WTM:L tya tgg swg cyg ccg swg swt gat ggc ggc tya cyg att myg skg 89WTM:P yct tgg smg ccg ccg smg smt gat ggc ggc yct ccg att mcg ssg 90

SPLIT POOL Synthesis: WTM Y Oligonucleotides

For performing tyrosine (Y) WTM where the starting “wildtype” amino acidis glutamate (E), an unwanted side effect may be the introduction of astop codon. Tyrosine is encoded by tat and tac, glutamate is encoded bygaa and gag, the amber stop codon is tag and the ochre stop codon istaa. Substituting a tyrosine for a glutamate, therefore, requires thatthe first position be at for the tyrosine or g for glutamate, the secondposition be an a, and the third position be at or c for tyrosine or an aor g for glutamate. Thus, a standard nucleotide mixture designed asdescribed above may result in a taa or tag.

In this situation it can be preferable that alternate oligonucleotidesynthesis procedure be utilized where pools of codons are synthesizedseparately then combined and split for the following round of synthesis(E A Peters, P J Schatz, S S Johnson, and W J Dower, J. Bacteriol. 1994July; 176(14): 4296-4305.). This split pool synthesis allows for theintroduction of these (Y) WTM residues without producing unwantedco-products such as the amber stop codon that occurs in regular singlereaction oligonucleotide synthesis. In split pool synthesis process, twopools are utilized: the first pool utilizes the codon TAT, encoding Y,and the second pool utilizes the codon GAA, encoding E. In this splitpool design, the amber stop codon is not made as each reaction pool doesnot code for the UAG or UAA stop codon. For BC_(—)15 the (Y) WTMoligonucleotide reaction is made until position 26 whereupon thereaction is split into two separate columns for further synthesis.

In order to produce the defined mixture of amino acids, two columns areutilized. In the first, the fixed 3′ portion of the oligonucleotides aresynthesized as defined by the flanking regions and the fixed portion ofthe BC_(—)15 loop shown above. For position 26 in the example sequenceabove, the first column synthesizes the codon TAT (TAT in the 3′-5′ DNAsynthesis). The second columns synthesize the codon GAA (AAG in the3′-5′ DNA synthesis). After the three nucleotides are coupled, the twocolumns are opened, the synthesis support is removed by washing withacetonitrile, and the resins are pooled. After mixing, the resin isplaced into one column. At this point, the next position, positions 25and 24 are synthesized. The single column synthesizes the codon CGG andYMT as described above. The resin is then split for (Y) WTM position 23and reapportioned as described for position 26. After which, thesyntheses are pooled and continued until the 5′ fixed and flankingregion are.

WTM oligonucleotide design for the BC/14 and BC/11 loops are the carriedout in similar fashion.

In reference to FN3 BC/14 which the variability profile is described as:

BC_14 21 22 23 24 25 26 26a 26b 26c 27 28 29 30 31 X-S W X-K X-P P X-DX-D X-P X-N X-G X-P I X-T X-G

TABLE 7 WTM oligonucleotide design for BC/14 BC_14 21 22 23 24 25 26 26a26b 26c 27 28 29 30 31 SEQ X-S W X-K X-P P X-D X-D X-P X-N X-G X-P I X-TX-G ID NO WTM:K arm tgg aaa mma ccg raw Raw mma aaw rra mma att ama rra91 WTM:G rgc tgg rra ssc ccg grc Grc ssc rrc ggc ssc att rsc ggc 92WTM:D kmt tgg raw smt ccg gat Gat smt rat grt smt att rmc grt 93 WTM:Qmrs tgg mag cmg ccg sak Sak cmg mak srg cmg att mmg srg 94 WTM:S agc tggarm yct ccg rrc Rrc ycy arc rgc ycy att asc rgc 95 WTM:H ymt tgg maw cmtccg sat Sat cmt mat srt cmt att mmc srt 96 WTM:Y tmt tgg aaa ymt ccg katKat ymt wat krt ymt att wmc krt 97 tat WTM:L tya tgg mwg cyg ccg swt Swtcyg mwt skg cyg att myg skg 98 WTM:P yct tgg mmg ccg ccg smt Smt ccg mmtssg ccg att mcg ssg 99In reference to FN3 BC_(—)11 which the variability profile is describedas:

BC_11 21 22 23 24 25 26 27 28 29 30 31 X-S W X-T X-P P X-P X-C X-P X-VX-D X-G

TABLE 8 WTM oligonucleotide design for BC/11 BC_11 21 22 23 24 25 26 2728 29 30 31 SEQ ID X-S W X-T X-P P X-P X-C X-P X-V X-D X-G NO WTM:K armtgg ama mma ccg mma Rra mma rwg raw rra 100 WTM:G rgc tgg rsc ssc ccgssc Ggc ssc gkc grc ggc 101 WTM:D kmt tgg rmc smt ccg smt Grt smt gwtgat grt 102 WTM:Q mrs tgg mmg cmg ccg cmg Srg cmg swg sak srg 103 WTM:Sagc tgg asc yct ccg yct Rgc yct rkc rrc rgc 104 WTM:H ymt tgg mmc cmtccg cmt Srt cmt swt sat srt 105 WTM:Y tmt tgg wmc ymt ccg ymt Krt ymtkwt kat krt 106 WTM:L Tya tgg myg cyg ccg cyg Skg cyg stg swt skg 107WTM:P yct tgg mcg ccg ccg ccg Ssg ccg svg smt ssg 108

The co-products formed in using the above oligonucleotide design for theBC_(—)11 loop are shown in FIGS. 15 a-15 i

In reference to ¹⁰FN3 DE_(—)6 which the variability profile is describedas:

DE LOOP 51 52 53 54 55 56 X-P X-G X-T X-E X-T X-S

Walkthrough ¹⁰FN3 DE/66 is performed at the appropriate variablepositions and listed Table 6. As above, in the sequence denoted with anX refers to the walkthrough amino acid, and the amino acid(s) followingthe (dash) - refer to the starting base wild type amino acid.

TABLE 9 WTM oligonucleotide design for DE/6 DE_6 51 52 53 54 55 56 SEQX-P X-G X-T X-E X-T X-S ID NO WTM:K mma rra Ama raa ama arm 109 WTM:Gssc ggc Rsc gra rsc rgc 110 WTM:D smt grt Rmc gaw rmc kmt 111 WTM:Q cmgsrg Mmg sag mmg mrs 112 WTM:S yct rgc Asc rrm asc agc 113 WTM:H cmt srtMmc saw mmc ymt 114 WTM:Y ymt krt Wmc kaw wmc tmt 115 WTM:L cyg skg Mygswg myg tya 116 WTM:P ccg ssg Mcg smg mcg yct 117

The co-products formed in using the above oligonucleotide design for theDE/6 loop are shown in FIGS. 17 a and 17 b for walkthrough amino acidsQ, K, H, G, S, L, P and Y in FIGS. 15 a and 15 b.

In reference to FN3 FG/11 which the variability profile is described as:

FG_9 76 77 78 79 80 81 82 83 84 X-N X-A X-A G X-V G/S X-P X-P S

TABLE 10 WTM oligonucleotide design for FG/11 FG_9 76 77 78 79 80 81 8283 84 SEQ ID X-N X-A X-A G X-V G/S X-P X-P S NO WTM:K aaw rmg rmg ggcRwg rgc mma mma Agc 118 WTM:G rrc gsc gsc ggc Gkc rgc ssc Ssc agc 119WTM:D rat gmc gmc ggc Gwt rgc smt Smt agc 120 WTM:Q mak smg smg ggc Swgrgc cmg Cmg agc 121 WTM:S arc kct kct ggc Rkc rgc yct yct agc 122 WTM:Hmat smc smc ggc Swt rgc cmt cmt agc 123 WTM:Y wat kmc kmc ggc Kwt rgcymt ymt agc 124 WTM:L mwt syg syg ggc Stg rgc cyg cyg agc 125 WTM:P mmtscg scg ggc Syg rgc ccg ccg agc 126

The co-products formed in using the above oligonucleotide design for theFG_(—)9 loop are shown in FIGS. 18 a-18 c.

In reference to FN36 which the variability profile is described as:

76 77 78 79 80 84 FG_6 X-N X-G X-G X-G X-E S

TABLE 11 WTM oligonucleotide design for FG_6 76 77 78 79 80 84 SEQ IDFG_6 X-N X-G X-G X-G X-E S NO WTM:K aaw rra rra rra Raa arm 127 WTM:Grrc ggc ggc ggc Gra rgc 128 WTM:D rat grt grt grt Gaw kmt 129 WTM:Q maksrg srg srg Sag mrs 130 WTM:S arc rgc rgc rgc Rrm agc 131 WTM:H mat srtsrt srt Saw ymt 132 WTM:Y wat krt krt krt tat tmt 133 gaa WTM:L mwt skgskg skg Swg tya 134 WTM:P mmt ssg ssg ssg Smg yct 135

All 20 amino acids and unnatural amino acids utilizing the amber codoncan potentially be walked through at the appropriate shaded positions.

Oligo Construction using Table 11, was carried out using extendedwalkthrough and doping as follows.

Extended walkthrough (for loop) can also be performed at the appropriateloop positions where noted in the sequence denoted with an X. The Xrefers to the walkthrough amino acid, and the amino acid(s) followingthe (dash) - refer to the base wild-type amino acid and any requiredco-products denoted after a (slash)/. Extended walkthrough is anotherexample of WTM codon design where the co-product can be predetermined tooccur. For example, the variability profiles have illustrated that G andP are frequent amino acids in the loop positions. The frequency of someG and P amino acids do not characterize them as either “fixed” or“wild-type” amino acids in the starting WTM sequence. However, byextended walkthrough, the degenerate WTM codons can be biased to thateither G or P can be a WTM co-product.

TABLE 12 SEQ ID BC_14 21 22 23 24 25 26 26a 26b 26c 27 28 29 30 31 NORegular X-S W X-K X-P P X-D X-D X-P X-N X-G X-P I X-T X-G WTM ExtendedX-S W X-K X-P P X-D X-D X-P/G X-N/G X-G X-P I X-T X-G WTM WTM:K arm tggaaa maa ccg raa Raw gat Ggc arm mma att ama rra 136 Extended arm tgg aaamaa ccg raa Raw mca Rrm arm mma att ama rra 137 WTM:K

Here, a second BC/14 loop variation is given to exemplify the use ofextended walkthrough mutagenesis with required co-products. The regularWTM and extended WTM design in Table 11 for BC_(—)14 is shown above. Inthis case, regular WTM has P and N as the starting “wildtype” amino acidat positions 26b and 26c respectively. For WTM using K as the targetamino acid, the required co-products are G at both positions.

For position 26b, we use the degenerate codon v(acg)ca. The codon acacodes for the WTM K amino acid, the codon cca codes P, the wildtypeamino acid, and gca codes for G the required coproduct amino acid.Therefore a, c, and g are in the first position, c is in the secondposition, and a is in the third codon position.

Similarly, the degenerate codon rrm is used at position 26c. The codonaaa codes for K, aac codes for N, and gga codes for G. Therefore a, andg are in the first position, a and g are in the second position, and cand a are in the third codon position.

These extended WTM (K) results are summarized below and the 5′ and 3′flanking codons are the same as the regular WTM (K) oligonucleotides.This principle would be applied to the remaining WTM amino acids (G, S,H, L, P, Y, D, and Q) so that degenerate codons are designed toincorporate the WTM, wildtype and any required co-products.

BC_14 21 22 23 24 25 26 26a 26b 26c 27 28 29 30 31 Regular X-S W X-K X-PP X-D X-D X-P X-N X-G X-P I X-T X-G WTM Extended X-S W X-K X-P P X-D X-DX-P/G X-N/G X-G X-P I X-T X-G WTM WTM:K arm tgg Aaa maa ccg raa Raa gatarm mma att ama rra

TABLE 13 Doping of base mixture incorporation of extended BC_14 WTM (K)oligonucleotide (SEQ ID NO: 138). 25 26 26a 26b 26c 27 5′- c c g a a a aa a a c a a a c a a a -3′ g g c g g a g c g 70 50 50 50 40 40 40 50 5030 50 50 25 60 60 60 50 50 25

Doping of the incoming base mixture can be performed to achievepredetermined amino acid ratios.

In position 26 of this example, it is desired that the WTM K (lysine) beincorporated into the loop 70% of the time as compared to the 30% ofstarting D (aspartic acid) wildtype amino acid. The usage of lysine isdefined by the percentage of “a” utilized in the first codon position.Therefore, to achieve 70% lysine incorporation, the percentage of “a” inthe mixture was introduced at 70% while “g” was introduced at 30% toachieve 30% aspartic acid.

In position 26a of this example, it is desired that the WTM K (lysine)be incorporated into the loop in equal 50% with 50% of starting D(aspartic acid) wildtype amino acid. Therefore, to achieve 50% lysineincorporation and 50% aspartic acid, the percentage of “a” and “g” wereboth adjusted to be introduced at 50% for oligonucleotide incorporation.

Similarly, in positions 96-99, the level of glycine incorporation wastuned to achieve an approximately 25% level of glycine incorporationwhile decreasing the level of co-product incorporation.

In the preferred usage, flanking regions are added to the 5′ and 3′oligonucleotide regions to facilitate incorporation of the BC, DE and FGloops into the β-scaffold FN3 domain. The flanking regions havecomplementary base pairing with the oligonucleotides encoding otherβ-strands so that SOE-PCR can be performed (see FIG. 20). The followingflanking scaffold 5′ and 3′ sequences are used for the BC, DE and FGloops:

B′ β-strand 16 17 18 19 20 BC loop 31 32 33 34 35 C′ β-strand ACC TCTTTG CTT ATA TCG GAT CAC GTA C (SEQ ID NO: 194) D′ β-strand 46 47 48 4950 DE loop 56 57 58 59 60 E′ β-strand F′ β-strand 71 72 73 74 75 FG loop85 86 87 88 89 G′ β-strand

Example 5 WTM Oligonucleotide Design (No Threshold)

In another embodiment it is possible to execute WTM mutagenesis for theentire length of each loop, all the positions in the loops wereconsidered as variable (threshold=100%). Therefore this library is asuperset of the one in the previous example. In addition, because of howWTM works, the most abundant amino acid at each position is encoded ateach and every position in the loops with a frequency between 25% and50%. Thus, the libraries will cover a much larger sequence space, yetstill be biased towards the amino acid pattern found in the database.

Positions at the strand-loop boundary require special handling. In factresidues at this positions might very important for the stability of theframework. Therefore if such positions are highly conserved in therelated variability profile, they will be kept fixed (see S21, W22 andG31 in loop BC_(—)11; S21 and W22 in loops BC_(—)14 and BC_(—)15; N76 inloops FG_(—)8 and FG_(—)11).

TABLE 14 Summary of loop sequences for the universal Fibronectin IIIbinding domains library BC LOOP 21 22 23 24 25 26 27 28 29 30 31 BC_11 SW X-T X-P X-P X-P X-G X-P X-V X-D G 21 22 23 24 25 26 26a 26b 26c 27 2829 30 31 BC_14 S W X-K X-P X-P X-D X-D X-P X-N X-G X-P X-I X-T X-G 21 2223 24 25 26 26a 26b 26c 26d 27 28 29 30 31 BC_15 S W X-E X-P X-P X-E X-DX-D X-G X-G X-S X-P X-I X-T X-G DE LOOP 51 52 53 54 55 56 DE_6 X-P X-GX-T X-E X-T X-S FG LOOP 76 77 78 79 80 84 85 86 FG_8 N X-G X-G X-G X-EX-S X-S X-K 76 77 78 79 80 81 82 83 84 85 86 FG_11 N X-A X-A X-G X-V X-GX-P X-P X-S X-S X-K

By following Table 14, the design of loop BC_(—)15 will be as describedhere below:

21 22 23 24 25 26 26a 26b 26c 26d 27 28 29 30 31 BC_15 S W X-E X-P X-PX-E X-D X-D X-G X-G X-S X-P X-I X-T X-G

To perform WTM using Lys (K), one would introduce selective degeneratecodons to place both the starting “wild type” amino acid and the WTM (K)amino acid in the selected “variable” position.

21 22 23 24 25 26 26a 26b 26c 26d 27 28 29 30 31 BC_15 S W X-E X-P X-PX-E X-D X-D X-G X-G X-S X-P X-I X-T X-G WTM:K agc tgg raa mma mma raaraw Raw rra rra arm mma awa ama rra

The WTM (K) oligonucleotide sequence for BC_(—)15 would then be:5′-(agc)(tgg)(raa)(mma)(mma)(raa)(raw)(raw)(rra)(rra)(arm)(mma)(awa)(ama)(rra)-3′(SEQ ID NO:82) (using degenerate base codes)

or5′-(agc)(tgg)(a/g a a)(a/c a/c a)(a/c a/c a)(a/g a a)(a/g a a/t)(a/g aa/t)(a/g a/g a)(a/g a/g a)(a a/g a/c)(a/c a/c a)(a a/t a)(a a/c a)(a/ga/g a)-3′ using standard base codes.

The oligonucleotide to be used in the subsequent PCR reaction willcontain both 5′ and 3′ flanking regions as shown here.

(SEQ ID NO:195) ACCACCATCACAATTAGCTGGRAAMMAMMARAARAWRAWRRARRAARMMMAAWAAMARRATTCCAAGTCGACGCA

The underlined portion encodes for the varied BC loop as describedabove. The 5′ flanking sequence encodes the C-terminal portion of the Bβ strand domain and the 3′ flanking sequence encodes the N-terminalportion of the C β strand domain.

For position 21, (agc) encodes S.

For position 22, (tgg) encodes W.

For position 23, (raa) or (a/g a a) encodes E or K.

For position 24, (mma) or (a/c a/c a) encodes P, K, Q or T.

For position 25, (mma) or (a/c a/c a) encodes P, K, Q or T.

For position 26, (raa) or (a/g a a) encodes E or K.

For position 26a, (raw) or (a/g a a/t) encodes D, E, K or N.

For position 26b, (raw) or (a/g a a/t) encodes D, E, K or N.

For position 26c, (rra) or (a/g a/g a) encodes G, R, E or K.

For position 26c, (rra) or (alg a/g a) encodes G, R, E or K.

For position 27, or arm or (a a/g a/c) encodes S, N, K or R.

For position 28, or mma (a/c a/c a) encodes P, K, Q or T.

For position 29, or (awa) or (a a/t a) encodes I or K.

For position 30, or ama (a a/c a) encodes T or K.

For position 31, or rra (a/g a/g a) encodes G, R, E or K.

This approach would be continued for the remaining WTM amino acids: G,S, H, L, P, Y, E, and Q (see Table 5).

TABLE 15 WTM oligonucleotide design for BC/15 21 22 23 24 25 26 26a 26b26c 26d 27 28 29 30 31 SEQ ID BC_15 S W X-E X-P X-P X-E X-D X-D X-G X-GX-S X-P X-I X-T X-G NO WTM:K agc tgg raa mma mma raa Raw Raw rra rra armmma awa ama rra 139 WTM:G agc tgg gra ssc ssc gra Grc Grc ggc ggc rgcssc rkc rsc ggc 140 WTM:D agc tgg gaw smt smt gaw Gat Gat grt grt kmtsmt rwt rmc grt 141 WTM:Q agc tgg sag cmg cmg sag Sak Sak srg srg mrscmg mwa mmg srg 142 WTM:S agc tgg rrm yct yct rrm Rrc Rrc rgc rgc agcyct akc asc rgc 143 WTM:H agc tgg saw cmt cmt saw Sat Sat srt srt ymtcmt mwt mmc srt 144 WTM:Y agc tgg tat ymt ymt tat Kat Kat krt krt tmtymt wwt wmc krt 145 gaa gaa WTM:L agc tgg swg cyg cyg swg Swt Swt skgskg tya cyg wta myg skg 146 WTM:P agc tgg smg ccg ccg smg smt Smt ssgssg yct ccg myt mcg ssg 147

WTM oligonucleotide design for the BC_(—)14 and BC_(—)11 loops are thecarried out in similar fashion.

In reference to FN3 BC_(—)14 which the variability profile is describedas:

21 22 23 24 25 26 26a 26b 26c 27 28 29 30 31 BC_14 S W X-K X-P X-P X-DX-D X-P X-N X-G X-P X-I X-T X-G

TABLE 16 WTM oligonucleotide design for BC/14 21 22 23 24 25 26 26a 26b26c 27 28 29 30 31 SEQ ID BC_14 S W X-K X-P X-P X-D X-D X-P X-N X-G X-PX-I X-T X-G NO WTM:K agc tgg aaa mma mma raw raw mma aas rra mma awa amarra 148 WTM:G agc tgg rra ssc ssc grc grc ssc rrt ggc ssc rkc rsc ggc149 WTM:D agc tgg raw smt smt gat gat smt rat grt smt rwt rmc grt 150WTM:Q agc tgg mag cmg cmg sak sak cmg mcw srg cmg mwa mmg srg 151 WTM:Sagc tgg arm yct yct rrc rrc yct arc rgc yct akc asc rgc 152 WTM:H agctgg maw cmt cmt sat sat cmt mat srt cmt mwt mmc srt 153 WTM:Y agc tggaaa ymt ymt kat kat ymt wat krt ymt wwt wmc krt 154 tat WTM:L agc tggmwg cyg cyg swt swt cyg mwc skg cyg wta myg skg 155 WTM:P agc tgg mmgccg ccg smt smt ccg mmt ssg ccg myt mcg ssg 156

In reference to FN3 BC_(—)11 which the variability profile is describedas:

21 22 23 24 25 26 27 28 29 30 31 BC_11 S W X-T X-P X-P X-P X-G X-P X-VX-D G

TABLE 17 WTM oligonucleotide design for BC/11 21 22 23 24 25 26 27 28 2930 31 SEQ ID BC_11 S W X-T X-P X-P X-P X-G X-P X-V X-D G NO WTM:K agctgg ama mma mma mma rra mma rwg raw ggc 157 WTM:G agc tgg rsc ssc sscssc ggc ssc gkc grc ggc 158 WTM:D agc tgg rmc smt smt smt grt smt gwtgat ggc 159 WTM:Q agc tgg mmg cmg cmg cmg srg cmg swg sak ggc 160 WTM:Sagc tgg asc yct yct yct rgc yct rkc rrc ggc 161 WTM:H agc tgg mmc cmtcmt cmt srt cmt swt sat ggc 162 WTM:Y agc tgg wmc ymt ymt ymt krt ymtkwt kat ggc 163 WTM:L agc tgg myg cyg cyg cyg skg cyg stg swt ggc 164WTM:P agc tgg mcg ccg ccg ccg ssg ccg syg smt ggc 165

The co-products formed in using the above oligonucleotide design for theBC_(—)11 loop are shown in FIGS. 15 a and 15 b.

In reference to DE_(—)6 the variability profile is described as:**

51 52 53 54 55 56 DE/6 X-P X-G X-T X-E X-T X-S

Walkthrough DE/6 is performed at the appropriate variable positions andlisted Table 6. As above, in the sequence denoted with an X refers tothe walkthrough amino acid, and the amino acid(s) following the (dash) -refer to the starting base wild type amino acid.

TABLE 18 WTM oligonucleotide design for DE/6 51 52 53 54 55 56 SEQ IDDE/6 X-P X-G X-T X-E X-T X-S NO WTM:K mma Rra Ama raa ama arm 166 WTM:Gssc Ggc Rsc gra rsc rgc 167 WTM:D smt Grt Rmc gaw rmc kmt 168 WTM:Q cmgSrg Mmg sag mmg mrs 169 WTM:S yct Rgc Asc rrm asc agc 170 WTM:H cmt SrtMmc saw mmc ymt 171 WTM:Y ymt Krt Wmc kaw wmc tmt 172 WTM:L cyg Skg Mygswg myg tya 173 WTM:P ccg Ssg Mcg smg mcg yct 174

The co-products formed in using the above oligonucleotide design for theDE/6 loop are shown in FIGS. 17 a and 17 b.

In reference to the FG/11 loop the variability profile is described as:

76 77 78 79 80 81 82 83 84 85 86 FG/11 N X-A X-A X-G X-V X-G X-P X-P X-SX-S X-K

TABLE 19 WTM oligonucleotide design for FG/11 76 77 78 79 80 81 82 83 8485 86 SEQ ID FG/11 N X-A X-A X-G X-V X-G X-P X-P X-S X-S X-K NO WTM:Kaat rmg rmg Rra Rwg Rra mma mma arm arm aaa 175 WTM:G aat gsc gsc GgcGkc Ggc ssc ssc rgc rgc rra 176 WTM:D aat gmc gmc Grt Gwt Grt smt smtkmt kmt raw 177 WTM:Q aat smg smg Srg Swg Srg cmg cmg mrs mrs mag 178WTM:S aat kct kct Rgc Rkc Rgc yct yct agc agc arm 179 WTM:H aat smc smcSrt Swt Srt cmt cmt ymt ymt maw 180 WTM:Y aat kmc kmc Krt Kwt krt ymtymt tmt tmt aaa 181 tat WTM:L aat syg syg Skg Stg Skg cyg cyg tya tyamwg 182 WTM:P aat scg scg Ssg Syg Ssg ccg ccg yct yct mmg 183

The co-products formed in using the above oligonucleotide design for theFG/9 loop are shown in FIGS. 18 a-18 c.

In reference to the FG/8 loop the variability profile is described as:

76 77 78 79 80 84 85 86 FG/8 N X-G X-G X-G X-E X-S X-S X-K

TABLE 20 WTM oligonucleotide design for FG/8 76 77 78 79 80 84 85 86 SEQID FG/8 N X-G X-G X-G X-E X-S X-S X-K NO WTM:K aat rra rra rra raa armarm aaa 184 WTM:G aat ggc ggc ggc gra rgc rgc rra 185 WTM:D aat grt grtgrt gaw kmt kmt raw 186 WTM:Q aat srg srg srg sag mrs mrs mag 187 WTM:Saat rgc rgc rgc rrm agc agc arm 188 WTM:H aat srt srt srt saw ymt ymtmaw 189 WTM:Y aat krt krt krt tat tmt tmt aaa 190 gaa tat WTM:L aat skgskg skg swg tya tya mwg 191 WTM:P aat ssg ssg ssg smg yct yct mmg 192

Example 6 Methods for Genetically Engineering a Fibronectin BindingDomain Library

In this example, the steps for making and assembling a universalfibronectin binding domain library using genetic engineering techniquesare described. The approach in making the WTM FN3 libraries issummarized below and in FIG. 20.

Briefly, the fibronectin modules are cloned using standard molecularbiology techniques. The oligonucleotides encoding the beta-strandscaffold framework and diversity loops of the variable regions areassembled by the single overlap extension polymerase chain reaction(SOE-PCR) as illustrated in FIG. 20. The full-length molecules are thenamplified using flanking 5′ and 3′ primers containing restriction sitesthat facilitate cloning into the expression-display vector(s). The totaldiversity of the libraries generated depends on the number of frameworksequences and number of positions in the loops chosen for mutagenesis,e.g., using WTM.

For example, the diversity of the FN3 BC/11, FN3 DE/6 and FN3 FG/11library, using 9 amino acids to conduct WTM™, is 3.5×10⁶ (2² for FN3BC/11×2⁶ for FN3 DE/6×2⁸ for the 13 amino acids FN3 FG/9) (see FIG. 21).The diversity of the is an upper limit and the diversity library from10¹⁰ to 10¹¹ which is within the range of the transformationefficiencies of bacterial systems.

Accordingly, 5 oligonucleotides are synthesized to encompass theβ-strand frameworks (strands A, B, C, D, E, F, and G) for thefibronectin module libraries. To build a library based on the module¹⁰FN3 scaffold, the β-strand oligonucleotides are listed below:

A′ beta-strand (sense strand encoding the N-terminus of A beta strandwith flanking region containing a BamHI restriction site): (SEQ IDNO:196) 5′-CATATTGCTGGTACTCAGGGATCCGTTAGTGACGTCCCA-3′ B′- beta-strand(antisense strand that spans the AB loop): (SEQ ID NO:197)TATAAGCAAAGAGGTAGGAGTCGCTGCAACCACT TCCAGATCCCGTGGGACGTCACTAAC-3′ C′-beta-strand (antisense strand that spans the CD loop): (SEQ ID NO:198)5′-AACTGTGAATTCTTGAACTGGTGAGTTGCCACCTGTCTCTCCGTACG TGATCCGATA-3′ D′-beta-strand (antisense strand that spans the EF loop): (SEQ ID NO:199)5′-CACTGCATAGACAGTAATGGTATAGTCGACACCGGG CTTAAGTCCAGAGATAGTTGC-3′F′ beta-strand (antisense strand that encodes the C-terminus of the Fbeta-strand, with flanking region containing an XhoI restriction site):(SEQ ID NO:200) 5′-ATGAACTGGGTATACTCTCTCGAGCGTCCGATAGTTGATGGA-3′

The B′, C′, D′, and F′ oligonucleotides described above are depicted inFIG. 20.

In addition, there are two flanking oligonucleotides that code for theN-terminal linker region and a C-terminal linker that have His and Mycimmunotags respectively. In addition, a subset of 30-60 degenerateoligonucleotides in FN3 BC/11, FN3 DE/6 and FN3 FG/9 loops aresynthesized (total 90-180). These oligonucleotides are assembled by theSOE-PCR method to generate the libraries that include the necessary FN3BC/11, FN3 DE/6 and FN3 FG/9 combinations.

In the SOE-PCR reactions, the oligonucleotides for β-strands and thedesigned combinations of the BC, DE and FG loop libraries are added. PCRreactions used: 5 μl of 10 μM oligonucleotide mix, 0.5 μl Pfx DNApolymerase (2.5 U/μl), 5 μl Pfx buffer (Invitrogen), 1 μl 10 mM dNTP, 1μl 50 mM MgSO4 and 37.5 μl dH20 at 94 C for 2 min, followed by 24 cyclesof 30 sec at 94 C, 30 sec at 50 C, and 1 min at 68 C and then incubatedfor a 68 C for 5 min. The reactions were performed using a programmablethermocycler (MJ Research). An aliquot of the above SOE-PCR reaction isthen taken to which the 5′ and 3′ primer pairs are added to a new PCRreaction.

Random clones from each library are then chosen for sequenceverification and assessment of library quality with respect to desiredmutational diversity, unintended point mutations, deletions, andinsertions. This efficiency contrasts with random/stochastic mutagenesisstrategies where uncontrolled introduction of various bases produceshigher levels of undesired base change effects leading to low expressionor fibronectin binding domains functionality due to unfavorable aminoacid usage and inadvertent stop codons.

Individual FN3 modules can then be subsequently linked with togetherwith natural FN linking sequences or use of synthetic poly-Gly-Serlinker (typically GGGGSGGGGSGGGGS) (SEQ ID NO:201) to generatemultimeric binding domains.

Example 7 Methods for Performing High-Throughput Affinity Maturation ofCandidates from a Universal Fibronectin Binding Domain Library

In this example, the steps for identifying and improving a candidatefibronectin binding domain using CBM (combinatorial benefical mutations)/WTM/LTM affinity maturation is described.

Briefly, a known 9-14 TNFα fibronectin binding domain can be designatedas a test clone and mutagenized (using, e.g., CBMWTMLTMtechnology),expressed, displayed, and improved according to the methods of theinvention. Regarding loop diversity, LTM is used to explore smallperturbations within the loops (e.g., one change per loop). For furtherimprovement, combinatorial beneficial mutagenesis (CBM), is subsequentlyused to exhaustively incorporate the individual LTM changes into theloop landscape(s).

The test lysozyme fibronectin binding domain was expressed and displayedusing phage display, although any of the above-mentioned yeast/bacterialdisplay systems can also be used.

Example 8 Screening and Analysis of WTM 14/FN3 Libraries for Anti-TNFαBinding Activity WTM Library Construction

The WTM fibronectin libraries were assembled by polymerase cyclingassembly (PCA) and cloned into a phagemid vector. The oligonucleotidesshare fifteen base pair homology with its reverse orientated neighboringoligonulceotide. The complete fibronectin gene was assembled using eightoligonucleotides. Oligo 3 contains the eleven amino acid BC loop, Oligo5 encodes the six amino acid DE loop and Oligo 7 contains the 9 aminoacid FG loop.

The 8 framework oligos for wildtype 14FN3 are:

Oligo 1: 14FN3_FMK_1_S (SEQ ID NO:202)CATATTGCTGGTACTCAGGGATCCAACGTTAGTCCGCCT Oligo 2: 14FN3_FMK_2_S (SEQ IDNO:203) AATTGTGATGGTGGTCTCGGTGGCGTCTGTGACGCGGGCACGCCGAGGCG GACTAACGTTOligo 3: 14FN3_FMK_3_S (SEQ ID NO:204)ACCACCATCACAATTAGCTGGCGCACTAAAACGGAAACAATCACCGGTTT CCAAGTCGACGCA Oligo4: 14FN3_FMK_4_A (SEQ ID NO:205)GATTGTCCGCTGAATGGGAGTTTGGCCGTTGGCTGGGACTGCGTCGACTT GGAA Oligo 5:14FN3_FMK_5_S (SEQ ID NO:206)ATTCAGCGGACAATCAAGCCTGACGTTCGTTCTTATACCATTACCGGG Oligo 6: 14FN3_FMK_6_A(SEQ ID NO:207) CAAAGTATACAGGTAGATTTTATAATCAGTTCCTGGCTGTAACCCGGTAATGGTATA Oligo 7: 14FN3_FMK_7_S (SEQ ID NO:208)TACCTGTATACTTTGAATGACAACGCGCGTAGCAGTCCGGTGGTTATAGA TGCCAGC Oligo 8:14FN3_FMK_8_A (SEQ ID NO:209) TGAACTGGGTATACTCTCTCGAGCGTGCTGGCATCTATAAC

The UF3L-WTM-TYR construction use Oligo 1, Oligo 2, Oligo 4, Oligo 6,and Oligo 8 of the wildtype 14FN3 framework sequence. Oligo 3, Oligo5A/Oligo 5B, and Oligo 7A/Oligo 7B are replace with the following:

Oligo 3: 14FN3UL_w3_S_WTM-TYR (SEQ ID NO:210)ACCACCATCACAATTAGCTGGWMCYMTYMTYMTKRTYMTKWTKATGGCTT CCAAGTCGACGCA Oligo5A: 14FN3UL_w5_S_WTM_TYR1 (SEQ ID NO:211)ATTCAGCGGACAATCYMTKRTWMCTATWMCTMTTATACCATTACCGGG Oligo 5B:14FN3UL_w5_S_WTM_TYR2 (SEQ ID NO:212)ATTCAGCGGACAATCYMTKRTWMCGAAWMCTMTTATACCATTACCGGG Oligo 7A:14FN3UL_w7_S_WTM_TYR1 (SEQ ID NO:213)TACCTGTATACTTTGAATKMCKMCKRTKWTKRTYMTYMTTMTTMTAAACC GGTGGTTATAGATGCCAGCOligo 7B: 14FN3UL_w7_S_WTM_TYR2 (SEQ ID NO:214)TACCTGTATACTTTGAATKMCKMCKRTKWTKRTYMTYMTTMTTMTTATCC GGTGGTTATAGATGCCAGC

The 14 amino acid extended BC loop library, UF3L-WTM-TYR-BC_extended,construction use Oligo 1, Oligo 2, Oligo 4, Oligo 6, and Oligo 8 of thewildtype 14FN3 framework sequence. Oligo 3A, Oligo 3B, Oligo 3C, Oligo3D, Oligo 5A/Oligo 5B, and Oligo 7A/Oligo 7B are the same as in theUF3L-WTM-TYR library.

Oligo 3A BC ext: 14FN3UL_w3_15_S_WTM_TYR1 (SEQ ID NO:215)ACCACCATCACAATTAGCTGGTATYMTYMTTATKATKATKRTKRTTMTYMTWWTWMCKRTTTCCAAGTCGACGCA Oligo 3B BC ext: 14FN3UL_w3_15_S_WTM_TYR2 (SEQID NO:216) ACCACCATCACAATTAGCTGGTATYMTYMTGAAKATKATKRTKRTTMTYMTWWTWMCKRTTTCCAAGTCGACGCA Oligo 3C BC ext: 14FN3UL_w3_15_S_WTM_TYR3 (SEQID NO:217) ACCACCATCACAATTAGCTGGGAAYMTYMTTATKATKATKRTKRTTMTYMTWWTWMCKRTTTCCAAGTCGACGCA Oligo 3D BC ext: 14FN3UL_w3_15_S_WTM_TYR4 (SEQID NO:218) ACCACCATCACAATTAGCTGGGAAYMTYMTGAGKATKATKRTKRTTMTYMTWWTWMCKRTTTCCAAGTCGACGCA

All eight oligos are mixed together at 10 μM to assemble the fibronectingene by PCA. The 50 μl Assembly PCR 1 mixture contains 5 μl pooledoligos (10 μM), 10 μl 5× Buffer (Finnzymes), 1 μl 10 mM dNTPs, 0.5 μlPhusion Polymerase (Finnzymes), 33.5 μl diH₂O. The PCR protocol is onecycle of 98° C. for 30 sec, followed by 30 cycles of 98° C. for 7 sec,50° C. for 20 sec, and 72° C. for 15 sec. The final cycle is 72° C. for1 min.

The 100 μl Rescue PCR mixture contains 2.5 μl of the PCR 1 reaction, 2.5μl of Oligo 1: 14FN3_FMK_(—)1_S, 2.5 μl of Oligo 8: 14FN3_FMK_(—)8_A, 2μl 10 mM dNTPs, 20 μl 5× Buffer (Finnzymes), 1 μl Phusion Polymerase(Finnzymes), 69.5 μl diH₂O. The PCR protocol is one cycle of 98° C. for30 sec, followed by 30 cycles of 98° C. for 7 sec, 50° C. for 20 sec,and 72° C. for 15 sec. The final cycle is 72° C. for 1 min.

The PCR amplicons were purified using the Qiagen Qiaquick PCR Clean-upKit by manufacturer's protocol and eluted off the spin column with 80 μldiH₂O. The PCR fragment was digested at 37° C. with the restrictionenzymes BamHI (NEB) and XhoI (NEB), gel purified on a 1.2% agarose geland the DNA band was purified using the Qiagen Gel Extraction Kit.

The backbone DNA for the phage display libraries is a pBluescript-basedphagemid vector. This vector features: (1) a N-terminal signal sequencefrom the E. Coli DsbA gene for periplasmic export, (2) a multiplecloning site to insert the FN3-based scaffolds, (3) a c-myc epitope tagfor monitoring protein expression, (4) a hexahistidine tag for proteinpurification, (5) the C-terminal domain (aa250-406) of the M13 p3protein for display of the FN3 on the phage surface, and (6) an amberstop codon (TAG) inserted between the FN3 and the M13 p3 CTD whichallows switching between expression of membrane anchored FN3-p3 fusionproteins and soluble FN3 proteins simply by using different E. Colistrains. Expression of fusion proteins is under the control of aninducible lac promoter.

The 20 μg of phagemid vector was digested at 37° C. with the restrictionenzymes BamHI (NEB), XhoI (NEB), and Calf Intestinal AlkalinePhosphatase (CIP from NEB) until completion. The backbone DNA was gelpurified on a 1.2% agarose gel and the DNA band was purified using theQiagen Gel Extraction Kit. Overnight ligations were set-up with thephagemid backbone DNA and the gene synthesized PCR fragment. Theligations were purified by phenol/chloroform extraction and ethanolprecipitated and electroporated into E. Coli XL-1 Blue cells(Stratagene). Library transformation efficiency was determined by serialdilution tittering.

The effective library sizes for UF3L-WTM-TYR-BC_extended was 8.0e⁷variants and for UF3L-WTM-TYR was 1.2e⁸ variants.

Phage Display

Phage particle library stocks were produced by starting with an initialinoculum of library TG1 cells at ten times the size of the library(media=2YT,1% glucose, 50 ug/ml ampicillin). When cells reached anOD₆₀₀=0.5, an MOI=20 of M13K07 helper phage (Invitrogen) is added andthe cells are incubated at 37° C. for 30 minutes. Next the cells shakefor an additional 30 minutes at 37° C. The cells are spun down andresuspended in induction media (2YT, 50 ug/ml ampicillin, 25 ug/mlkanamycin, and 0.1 mM IPTG) and shaked overnight at 30° C. Phage ispurified from the supernatant by standard 20% PEG/2.5M NaClprecipitation protocol and tittered by serial dilution in TG1 cells.

The following conditions were used for the panning of the libraryagainst the target TNF-α by phage display. The first three rounds ofpanning were preformed on Reacti-Bind™ NeutrAvidin™ Coated High BindingCapacity (HBC) Clear 8-well Strips with Superblock® Blocking Buffer(Pierce) and round 4 was done on a MaxiSorp plate (Nunc). This willeliminate any nuetravidin binding phage and phage that only recognizethe biotintylated TNF-α target protein. TNF-Negative controls werecarried out with no biotintylated-TNFα (R&D Systems) added. Differentblocking reagents will be used including BSA, ovalbumin, and instantmilk for each round of panning.

For Round 1, 0.25 ug of biotintylated TNF-α in 0.5% BSA blockingsolution (100 ul) bound to the plate for two hours and then washed withPBS-0.05% Tween-20. Next the overnight binding buffer was added whichcontains 1e10¹³ library phage particles, 1 uM biotin, and 0.2% BSA, 0.1%Tween-20 in PBS. The next day the wells were washed 5 times with 0.2%BSA, 0.1% Tween-20 in PBS and five more times with 0.05% Tween-20 inPBS. The phage was eluted with 100 ul 100 mM HCL and neutralized in 1 mlof 1.0 M Tris-HCl, pH 8.0. The phage was tittered on TG1 cells. Forround 1, the output phage was 1.7e⁷ phage/ml and the negative controlwas at 2.4e⁶ phage/ml. This is a 7.1 fold increase in enrichment.

For Round 2, 0.25 ug of biotintylated TNF-α in 0.5% ovalbumin blockingsolution (100 ul) bound to the plate for two hours and then washed withPBS-0.05% Tween-20. Next the overnight binding buffer was added whichcontains 1e10¹³ Round 1 phage particles, 1 uM biotin, 0.1 mg/mlstreptavidin, and 0.2% ovalbumin, 0.1% Tween-20 in PBS. The next day thewells were washed 5 times with 0.2% ovalbumin, 0.1% Tween-20 in PBS andfive more times with 0.05% Tween-20 in PBS. The phage was eluted with100 ul 100 mM HCl and neutralized in 1 ml of 1.0 M Tris-HCl, pH 8.0. Thephage was tittered on TG1 cells. For round 2, the output phage was 3.0e⁶phage/ml and the negative control was at 1.36e⁶ phage/ml. This is a 2.2fold increase in enrichment.

For Round 3, 0.25 ug of biotintylated TNF-α in 0.5% instant milkblocking solution (100 ul) bound to the plate for two hours and thenwashed with PBS-0.05% Tween-20. Next the overnight binding buffer wasadded which contains 1e10¹³ Round 2 phage particles, 1 uM biotin, 0.1mg/ml streptavidin, and 0.2% instant milk, 0.1% Tween-20 in PBS. Thenext day the wells were washed 5 times with 0.2% instant milk, 0.1%Tween-20 in PBS and five more times with 0.05% Tween-20 in PBS. Thephage was eluted with 100 ul 100 mM HCl and neutralized in 1 ml of 1.0 MTris-HCl, pH 8.0. The phage was tittered on TG1 cells. For round 3, theoutput phage was 5.66e⁷ phage/ml and the negative control was at 4.16e⁷phage/ml. This is a 1.4 fold increase in enrichment.

For Round 4, 0.25 ug of biotintylated TNF-α in 0.5% instant milkblocking solution (100 ul) was bound to a MaxiSorp plate for two hoursand then washed with PBS-0.05% Tween-20. Next the overnight bindingbuffer was added which contains 1e10¹³ Round 3 phage particles and 0.2%instant milk, 0.1% Tween-20 in PBS. The next day the wells were washed 5times with 0.2% instant milk, 0.1% Tween-20 in PBS and five more timeswith 0.05% Tween-20 in PBS. The phage was eluted with 100 ul 100 mM HCland neutralized in 1 ml of 1.0 M Tris-HCl, pH 8.0. The phage wastittered on TG1 cells. For round 4, the output phage was 1.0e⁶ phage/mland the negative control was at 3.56e⁵ phage/ml. This is a 2.8 foldincrease in enrichment.

ELISA Screening

After the third and fourth rounds of selection, a total of 373 cloneswere analyzed for TNF-α binding using phage ELISA. Briefly, phage fromindividual clones was produced by co-infection with M13K07 helper phagein 1 ml cultures in 96-deep well blocks. Phage-containing culturesupernatants were then added to microtiter plates coated with the targetprotein. Following washes with PBS-0.1% Tween-20, bound phage wasdetected with a mouse anti-M13 antibody horseradish peroxidase conjugate(GE Biosciences) and TMB substrate (Pierce). Positive variants were thenscreened for binding specificity against VEG-F and nuetravidin plateonly. From these assays, 48 positive clones were identified. As shown inFIG. 24 b, these clones bound TNF-α but not VEGF or a blank well,indicating specific binding to TNF-α.

Three of these clones were screened for further analysis. The variantswere expressed as soluble protein. Transforming the phagemid clone intothe non-suppressing TOP10 E. Coli strain produced soluble protein.Protein expression was induced with 1 mM IPTG for 4 hours at 30° C. andthe 6×His-tagged protein was purified by Ni-NTA spin columns (Qiagen).ELISA determined the binding specificity of these variants. As seen inFIG. 24 a, all three variants bound TNFα, but did not cross-react withVEGF, indicating specific binding to TNFα. The results of this screendemonstrated that the 14FN3 scaffold is suitable to generate specificbinding proteins.

Sequencing

After sequencing the 48 positive clones, 9 unique variants wereidentified. Sequencing of the anti-TNFα variants revealed significantconvergence of the sequences, particularly in the BC and FG loops, asseveral of the clones contained the same FG loop. Interestingly, all thevariants recovered were from the tyrosine WTM libraries, with a clearpreference for a BC loop length of 14 amino acids. In addition, all ofthe variants contained cysteine residues in the BC and FG loops, with adefinite selection for cysteines at positions 26C in the extended BCloop and 82 in the FG loop. The presence of these cysteines suggeststhat a disulfide bond may form between the BC and FG loop in thesevariants. Formation of a disulfide bridge between the BC and FG loopscould serve to stabilize the 14FN3 structure and bring the loops closertogether, resulting in a larger contact surface with TNFα and a higherbinding affinity.

The sequences of 8 high-affinity binders are shown in FIG. 24C, and thenine high-affinity binders are identified as SEQ ID NOS: 55-63.

Cloning 14FN3 TNFα Binder as Fc Construct

To examine the properties of a 14FN3-Fc fusion, the best anti-TNFαvariant, R41A6, was expressed as an Fc fusion protein. The template DNAfor the Fc was from clone MGC:39273 IMAGE:5440834 9 (Ref ID: BC024289).The following oligonucleotides were used to PCR amplify the R41A6 genewith NcoI and XhoI restriction sites and the Fc region of human IgG₁with XhoI ends.

14FN3 NcoI For: (SEQ ID NO:219) ATCGCCATGGAAAACGTTAGTCCGCCTCGGCGT 14FN3FMK 8 A: (SEQ ID NO:220) ATGAACTGGGTATACTCTCTCGAGCGTGCTGGCATCTATAAC FcXhoI For: (SEQ ID NO:221) ATCGCTCGAGCCCAAATCTTGTGACAAAAC Fc XhoI Rev:(SEQ ID NO:222) CGATCTCGAGTTTACCCGGAGACAGGGAGA

The PCR reaction for A6 amplification contained 2 ul phagemid R41A6mini-prep DNA (100 ng), 1 ul of 14FN3 NcoI For primer (20 uM), 1 ul of14FN3 FMK 8 A primer (20 uM), 1 ul 10 mM dNTPs (NEB), 5 ul 10× StandardTaq Reaction Buffer (NEB), 1 ul Taq DNA Polymerase (NEB), and 39 uldiH₂O. The PCR reaction for Fc amplification contained 2 ul Origeneclone BC024289.1 mini-prep DNA (100 ng), 1 ul of Fc XhoI For primer (20uM), 1 ul of Fc XhoI Rev primer (20 uM), 1 ul 10 mM dNTPs, 5 ul 10×Standard Taq Reaction Buffer (NEB), 1 ul Taq DNA Polymerase (NEB), and39 ul diH₂O. The PCR protocol is one cycle of 94° C. for 2 min.,followed by 25 cycles of 94° C. for 30 sec, 60° C. for 30 sec, and 75°C. for 4 min. The final cycle is 75° C. for 3 min.

First the TNF-α binder R41A6 gene was cloned into the pET-20b expressionvector (Novagen). This vector contains the pelB signal sequence forperiplasmic targeting of the protein and also a C-terminal hexahistidinetag for purification. The 20 μg of pET-20b vector was digested at 37° C.with the restriction enzymes NcoI (NEB), XhoI (NEB), and Calf IntestinalAlkaline Phosphatase (CIP from NEB) until completion. The backbone DNAwas gel purified on a 1.2% agarose gel, and the DNA band was purifiedusing the Qiagen Gel Extraction Kit. The PCR amplicons were purifiedusing the Qiagen Qiaquick PCR Clean-up Kit by manufacturer's protocoland eluted off the spin column with 80 μl diH₂O. The R41A6 PCR fragmentwas digested at 37° C. with the restriction enzymes NcoI (NEB) and XhoI(NEB), gel purified on a 1.2% agarose gel and the DNA band was purifiedusing the Qiagen Gel Extraction Kit. Overnight ligations were set-upwith the digested pET-20b backbone DNA and R41A6 PCR fragment. Theligation DNA (1 ul) was electroporated into TOP10 cells (Invitrogen) andclones selected on LB-AMP plates. Clone pET-20b+TNF A6 #27 was recoveredafter screening mini-prep DNA by restriction digest and sequencing.

Next the Fc fragment is cloned into the XhoI site of the vectorpET-20b+TNF A6 #27. The 20 μg of pET-20b+TNF A6 vector was digested at37° C. with the restriction enzymes XhoI (NEB), and Calf IntestinalAlkaline Phosphatase (CIP from NEB) until completion. The backbone DNAwas gel purified on a 1.2% agarose gel, and the DNA band was purifiedusing the Qiagen Gel Extraction Kit. The PCR amplicons were purifiedusing the Qiagen Qiaquick PCR Clean-up Kit by manufacturer's protocoland eluted off the spin column with 80 μl diH₂O. The Fc PCR fragment wasdigested at 37° C. with the restriction enzymes XhoI (NEB), gel purifiedon a 1.2% agarose gel and the DNA band was purified using the Qiagen GelExtraction Kit. Overnight ligations were set-up with the digestedpET-20b+TNF A6 #27 (A6) backbone DNA and Fc PCR fragment. The ligationDNA (1 ul) was electroporated into TOP10 cells (Invitrogen) and clonesselected on LB-AMP plates. Clone pET-20b+TNF A6+Fc #39 (A6+Fc) wasrecovered after screening mini-prep DNA for the correct orientation byrestriction digest and sequencing.

The A6 and A6+Fc clones were expressed in BL-21(DE3) cells (Invitrogen)and purified using Talon Magnetic beads (Invitrogen). The A6 and A6+Fcclones in BL-21 were grown to a density of OD₆₀₀=0.6 in LB media with100 ug/ml ampicillin. At this density, IPTG was added to a finalconcentration of 1 mM, and the cells were put in the shaker at 30° C.for 4 hours. Cell pellets were frozen overnight at −80° C.

Thawed cell pellets were resuspended in lysis buffer (50 mM Sodiumphosphate, pH 8, 500 mM NaCl, 40 mM Imidazole, 5% glycerol, 1× Bugbusterdetergent (EMD), 1× protease inhibitors (Sigma), and 1 ul Lysonase(EMD)) and mix for 20 minutes at 4° C. Lysate is spun down at 14,000 rpmfor 20 minutes at 4° C. and mixed with Talon magnetic beads for 1 hourat 4° C. Beads were pelleted on a magnet and washed five times with washbuffer (50 mM Sodium phosphate, pH 8, 500 mM NaCl, 40 mM Imidazole, 5%glycerol, and 0.01% Tween-20). Protein was eluted two times with 350 ulelution buffer (50 mM Sodium phosphate, pH 8, 500 mM NaCl, 150 mMImidazole, 5% glycerol, and 0.01% Tween-20) and buffer exchanged intoPBS using a 2 ml Zeba desalting column (Pierce). Protein concentrationwas quantified using the CBQCA Protein Quantification Kit (Invitrogen).

The K_(a), K_(dis), and K_(D) values for soluble protein A6 and A6+Fcwere determined using an Octet instrument (ForteBio), which measuresrefracted surface light interference. The target protein wasbiotintylated TNF-α soluble protein (R&D Systems) and is at aconcentration of 0.25 ug/ml for loading onto the streptavidin sensortips. The A6 protein was loaded at 290 nM and the A6+Fc protein wasloaded a 370 nM. The Octet was run using the manufacturers protocol forBasic Kinetic Assay. The K_(D) for A6 is between 20-100 nM and the K_(D)for A6+Fc is between 12-25 nM. The K_(off) for A6+Fc was improved 2 foldas compared to the monomeric A6 protein. These results demonstrate thatadding the Fc region to a 14FN3 binder can improve its bindingproperties.

Kinetic Binding Summary

Kon [1/Ms] Koff [1/s] KD A6 + Fc 9.12E+03 2.26E−04 24.8 nM A6 + Fc1.49E+04 2.20E−04 14.9 nM A6 + Fc 1.95E+04 2.29E−04 11.7 nM A6 2.37E+044.38E−04 18.4 nM A6 2.74E+03 2.67E−04 97.4 nM

Example 9 Screening and Analysis of Fn3 Libraries for Anti-VEGF Binding

In order to test the functionality of our combinatorial library design,a 14FN3 library with 2 natual-variant combinatorial loops (BC/11 andfDE/6) for binders designed. The library was constructed to contain thenatural-variant amino acids or chemical equivalents shown at the top inFIG. 25B for the BC loop. A comparison of the natural amino acidvariants given in FIG. 9 and those incorporated into the BC loop (FIG.25A) shows that in general, the highest occurring 1-6 amino acidvariants or their chemical equivalents were selected for at eachposition, where the number selected depended on the frequencydistribution of the variants, minimizing where necessary codon changesleading to co-produced amino acids, such that the average number ofvariants per residue was somewhat greater than 4, yielding a totaldiversity of about 10⁶. Similarly, the DE/6 loop was constructed tocontain the natural-variant amino acids or chemical equivalents shown atthe top in FIG. 25B for the BC loop. A comparison of the natural aminoacid variants given in FIG. 12 and those incorporated into the DE loop(FIG. 25A) shows that in general, the highest occurring 6 amino acidvariants or their chemical equivalents were used at each position, exeptfor position 1 where all nine variants were used. The greater averagenumber of variants at each position was guided, in part, on the fewernumber of residue positions, allowing more residues at each positionwhile still retaining a total deversity of less than about 10⁵.

The library was constructed in the phagemid vector as described above.To screen the library, phage were incubated with biotinylated VEGF(121)that had been pre-bound to streptavidin-coated magnetic beads for 1 hrat room temperature in PBS-2.5% nonfat dry milk. Following washes withPBS-0.1% Tween-20, bound phage were eluted with 0.1N HCl and propagatedin TG1 cells for the next round of selection. In round 1 the VEGFconcentration used was 1 μM and this was reduced 5-fold per round suchthat the final round of selection contained 20 nM VEGF. To measurerelative enrichment, phage prepared from a non-displaying controlphagemid (pBC), which confers chloramphenicol resistance, were alsoincluded in the selection reactions. In each round a 2 to 5-foldenrichment of library phage as compared to the non-displaying controlphage was observed (Table 20).

TABLE 20 VEGF 14FN3 Combinatorial Library Selection Round [VEGF] PhageTiter Enrichment 1   1 μM 1.6 × 10⁵ ND 2 0.2 μM 1.9 × 10⁵ 5 3  40 nM 4.0× 10⁵ 2 4  20 nM 4.2 × 10⁵ 3

After the selections, individual clones were screened for binding toVEGF by phage ELISA. A total of 380 clones were screened from rounds 3and 4, identifying 37 positive clones (FIG. 25 a). These positive cloneswere rescreened by phage ELISA for binding specificity against VEGF,TNFα or a blank plate

Of these 37 clones, 35 specifically bound VEGF. Sequencing of theseclones identified 3 unique variants (FIG. 25 b). One variant, R1D4, wasfound in 29/35 clones, while the other two were seen only once. Thisindicates significant convergence on a single sequence from thecombinatorial library.

The most abundant clone, R1D4, was selected for further analysis. R1D4was expressed as a soluble protein and purified using a Ni-chelatingcolumn. Using the Octet, the dissociation rate for binding to VEGF wasmeasured as 1.7×10⁻⁴ (FIG. 25 c). This is comparable to an anti-VEGFantibody which had a Koff value of 1.1×10⁻⁴. These findings validate thecombination of combinatorial library design and 14FN3 scaffold as ableto generate binders with significant affinity to their target.

Example 10 Screening and Analysis of FN3 Libraries for Anti-HMGB1Binding Activity HMGB1 Screen of 14FN3 Combinatorial Universal Library.

The 14FN3 natural-variant combinatorial library from Example 9 was alsoagainst another therapeutic target, HMGB1. (FIG. 26A shows the sequencevariation at each position of the BC/11 and DE/6 loops). HMGB1 wasinitially described as a ubiquitous DNA-binding protein critical forchromatin stabilization and transcriptional regulation. More recently,extracellular HMGB1, when released by either necrotic cells or activatedimmune cells, has been shown to be an important pro-inflammatorycytokine. To identify 14FN3-based antibody mimics against HMGB1, the14FN3 2 loop (BC/DE) combinatorial library for HMGB1 binders. Threerounds of phage display selection were performed as described above forVEGF. In round 1, either 0.5 or 0.1 μM HMGB1 was used. This was reduced2.5-fold each round such that the final round selections containedeither 80 or 16 nM HMGB1. In the final round of selection a 900 to1300-fold enrichment of library phage over a non-displaying controlphage was observed(Table 11).

TABLE 11 HMGB1 14FN3 Combinatorial Library Selection [HMGB1] Phage TiterEnrichment Round 1 0.5 μM 6.3 × 10⁴ ND 0.1 μM 1.2 × 10⁵ Round 2 0.2 μM5.4 × 10⁵ 16  40 nM 5.6 × 10⁵ 7 Round 3  80 nM 2.6 × 10⁷ 1300  16 nM 4.4× 10⁷ 898

Following the selections, 439 individual clones from rounds 2 and 3 werescreened by phage ELISA for binding to HMGB1. This screen resulted in143 positive clones. Sequencing of 50 of these clones identified 15unique 14FN3 variants (FIG. 26B, having sequences identified by SEQ IDNOS: 67-81. The most common variant was found in 22/50 clones, while 4other sequences were also isolated multiple times. Interestingly, weobserved significant similarity between the sequences, indicatingconvergence on a common set of traits. There was a clear selection forbasic amino acids in the DE loop, in particular a KR(S/T)-H motif. TheBC loop also contains several conserved residues: such as the valine orisoleucine at positions 24 and 29 and the arginine at positions 21 and31. The similarities between the variants suggest that they mayrecognize a common epitope in HMGB1.

To analyze these variants effectively, a small-scale expression systemto produce soluble protein was developed. Each clone was transformedinto the non-suppressing E. coli strain NEB Express Iq. Next, a 2 mlculture was induced with 1 mM IPTG for 6 hrs at 30° C. The 6×His-tagged14FN3 variants were then purified using Talon magnetic beads(Invitrogen). This procedure yielded from 5 to 10 μg of each solubleprotein.

To determine the relative affinities of the variants, the dissociationrates of the individual proteins were measured using the Octet. Sincethe dissociation rate is the largest contributor to binding affinity andis not dependent on protein concentration, this analysis allowed us toquickly screen the variants and select the best proteins for furthercharacterization. The clones displayed a range of dissociation ratesfrom 1.4×10⁻² to 1.0×10⁻⁶, with the majority between 1.5 to 2.0×10⁻⁴.Dissociation rates in this range suggest that these binders have a highaffinity for HMGB1. We then selected six candidates for furtherinvestigation. We measured association and disassociation rates usingthe Octet (FIG. 26C). Table 22 shows that the obtained K_(d) values arein the low-nanomolar range.

Variant k_(on)/10⁴ (M⁻¹ s⁻¹) k_(off)/10⁻⁴ (s⁻¹) K_(d) (nM) R2E6 8.3 5.06.0 R2G2 3.7 2.6 6.9 R2F8 9.9 2.2 2.2 R2B5 8.2 2.5 3.1 P2C6 1.8 6.2 35Kinetic constants were determined from measurements using an Octetinstrument with biotinylated HMGB1 immobilized on streptavidinbiosensors. Measurements were performed at 30° C. Protein concentrationswere determined using a CBQCA protein quantitation assay (Invitrogen)with a BSA standard curve. The equilibrium dissociation constant, K_(d),is calculated from the ratio of the rate constants, k_(off)/k_(on).

Example 11 Screening and Analysis of FN3 Libraries for CatalyticFunction Protease Activity Plate Assays

In the case where the activity to be assayed is a proteolytic activity,substrate-containing nutrient plates can be used for screening forcolonies which secrete a protease. Protease substrates such as denaturedhemoglobin can be incorporated into nutrient plates (Schumacher, G. F.B. and Schill, W. B., Anal. Biochem., 48: 9-26 (1972); Benyon and Bond,Proteolytic Enzymes, 1989 (IRL Press, Oxford) p. 50). FN3 variantlibraries can be displayed on the yeast cell surfaces or bacterial cellsurfaces {Rutherford, 2006 #60} {Varadarajan, 2005 #61} as fusionproteins.

Alternatively, when bacterial colonies capable of secreting a proteaseare grown on these plates, the colonies are surrounded by a clear zone,indicative of digestion of the protein substrate present in the medium.A protease must meet several criteria to be detected by this assay.First, the protease must be secreted into the medium where it caninteract with the substrate. Second, the protease must cleave severalpeptide bonds in the substrate so that the resulting products aresoluble, and a zone of clearing results. Third, the cells must secreteenough protease activity to be detectable above the threshold of theassay. As the specific activity of the protease decreases, the thresholdamount required for detection in the assay will increase. Proteases thatare produced as phage p3 fusion proteins are also capable of enzymaticfunction (REF).

One or more protease substrates may be used. For example, hemoglobin(0.05-0.1%), casein (0.2%), or dry milk powder (3%) can be incorporatedinto appropriate nutrient plates. Colonies can be transferred from amaster plate using and inoculating manifold, by replica-plating or othersuitable method, onto one or more assay plates containing a proteasesubstrate. Following growth at 37° C. (or the appropriate temperature),zones of clearing are observed around the colonies secreting a proteasecapable of digesting the substrate. Four proteases of differentspecificities and reaction mechanisms were tested to determine the rangeof activities detectable in the plate assay. The enzymes includedelastase, subtilisin, trypsin, and chymotrypsin. Specific activities(elastase, 81 U/mg powder; subtilisin, 7.8 U/mg powder; trypsin, 8600U/mg powder; chymotrypsin, 53 U/mg powder) were determined by themanufacturer. A dilution of each enzyme, elastase, subtilisin, trypsin,and chymotrypsin, was prepared and 5 μl aliquots were pipetted intoseparate wells on each of three different assay plates.

Plates containing casein, dry milk powder, or hemoglobin in a 1% Difcobacto agar matrix (10 ml per plate) in 50 mM Tris, pH 7.5, 10 mMCaCl.sub.2 buffer were prepared. On casein plates (0.2%), at the lowestquantity tested (0.75 ng of protein), all four enzymes gave detectableclearing zones under the conditions used. On plates containing powderedmilk (3%), elastase and trypsin were detectable down to 3 ng of protein,chymotrypsin was detectable to 1.5 ng, and subtilisin was detectable ata level of 25 ng of protein spotted. On hemoglobin plates, atconcentrations of hemoglobin ranging from 0.05 and 0.1 percent, 1.5. ngof elastase, trypsin and chymotrypsin gave detectable clearing zones. Onhemoglobin plates, under the conditions used, subtilisin did not yield avisible clearing zone below 6 ng of protein.

1. A method of forming a library of fibronectin Type 3 (FN3) domain polypeptides useful in screening for the presence of one or more polypeptides having a selected binding or enzymatic activity, comprising (i) aligning BC, DE, and FG amino acid loop sequences in a collection of native fibronectin Type 3 domain polypeptides, (ii) segregating the aligned loop sequences according to loop length, (iii) for a selected loop and loop length from step (ii), performing positional amino acid frequency analysis to determine the frequencies of amino acids at each loop position, (iv) for each loop and loop length analyzed in step (iii), identifying at each position a conserved or selected semi-conserved consensus amino acid and other natural-variant amino acids, (v) for at least one selected loop and loop length, forming: (1) a library of walk-through mutagenesis sequences expressed by a library of coding sequences that encode, at each loop position, the consensus amino acid, and if the consensus amino acid has a occurrence frequency equal to or less than a selected threshold frequency of at least 50%, a single common target amino acid and any co-produced amino acids, or (2) a library of natural-variant combinatorial sequences expressed by a library of coding sequences that encode at each loop position, a consensus amino acid and, if the consensus amino acid has a frequency of occurrence equal to or less than a selected threshold frequency of at least 50%, other natural variant amino acids, including semi-conserved amino acids and variable amino acids whose occurrence rate is above a selected minimum threshold occurrence at that position, or their chemical equivalents, (vi) incorporating the library of coding sequences into framework FN3 coding sequences to form an FN3 expression library, and (vi) expressing the FN3 polypeptides of the expression library.
 2. The method of claim 1, wherein the given threshold frequency is 100%.
 3. The method of claim 1, wherein the given threshold frequency is a selected frequency between 50-95%.
 4. The method of claim 1, wherein step (ii) segregates the loops and loop lengths into the group consisting of BC/11, BC/14, BC/15, DE/6, FG/8, and FG/11.
 5. The method of claim 5, wherein the library formed has a library of walk-through mutagenesis sequences formed at each of the loops and loop lengths selected from the group consisting of BC/11, BC/14, BC/15, DE/6, FG/8, and FG11, and from each of the common target amino selected from the group consisting of lysine, glutamine, aspartic acid, tyrosine, leucine, praline, serine, histidine, and glycine.
 6. The method of claim 1, wherein the library formed has a library of natural-variant combinatorial sequences at a combination of loops and loop lengths selected from loops BC and DE, BC and FG, and DE and FG loops, where the BC loop is selected from one of BC/11, BC/14, and BC/15, the DE loop is DE/6, and the FG loop is selected from one of FG/8, and FG11.
 7. The method of claim 6, wherein the given threshold is 100%, unless the loop amino acid position contains only one dominant and one variant amino, and the dominant and variant amino have side chains with similar physiochemical properties, in which case the given threshold is 90%.
 8. The method of claim 6, wherein the two loops in the selected combination of loops and loop lengths has an average diversity of between 10⁵ and 10⁷, or where the average number of different amino acids at each position is equal to or less than
 5. 9. The method of claim 1, wherein said polypeptides have the wildtype amino acid sequences in beta-strand regions A, AB, B, C, CD, D, E, EF, F, and G of the 14^(th) fibronectin Type III module of human fibronecton.
 10. A walk-through mutagenesis library of fibronectin Type 3 (FN3) domain polypeptides useful in screening for the presence of one or more polypeptides having a selected binding or enzymatic activity, said polypeptides comprising: (a) regions A, AB, B, C, CD, D, E, EF, F, and G having wildtype amino acid sequences of a selected native fibronectin Type 3 polypeptide, and (b) loop regions BC, DE, and FG having selected lengths, where at least one selected loop region of a selected length contains a library of walk through mutagenesis sequences expressed by a library of coding sequences that encode, at each loop position, a conserved or selected semi-conserved consensus amino acid and, if the consensus amino acid has an occurrence frequency equal to or less than a selected threshold frequency of at least 50%, a single common target amino acid and any co-produced amino acids.
 11. The library of claim 10, wherein the given threshold frequency is 100%.
 12. The library of claim 10, wherein the given threshold frequency is a selected frequency between 50-95%.
 13. The library of claim 10, wherein loops and loop lengths are selected from the group consisting of BC/11, BC/14, BC/15, DE/6, FG/8, and FG11, and which has a library of walk-through mutagenesis sequences formed at each of the loops and loop lengths selected from the group consisting of BC/11, BC/14, BC/15, DE/6, FG/8, and FG11.
 14. The library claim 13, which has a library of walk-through mutagenesis sequences formed from each of the common target amino selected from the group consisting of lysine, glutamine, aspartic acid, tyrosine, leucine, praline, serine, histidine, and glycine.
 15. A natural-variant combinatorial library of fibronectin Type 3 (FN3) domain polypeptides useful in screening for the presence of one or more polypeptides having a selected binding or enzymatic activity, said polypeptides comprising: (a) regions A, AB, B, C, CD, D, E, EF, F, and G having wildtype amino acid sequences of a selected native fibronectin Type 3 polypeptide, and (b) loop regions BC, DE, and FG having selected lengths, where at least one selected loop region of a selected length contains a library of natural-variant combinatorial sequences expressed by a library of coding sequences that encode at each loop position, a conserved or selected semi-conserved consensus amino acid and, if the consensus amino acid has a frequency of occurrence equal to or less than a selected threshold frequency of at least 50%, other natural variant amino acids, including semi-conserved amino acids and variable amino acids whose occurrence rate is above a selected minimum threshold occurrence at that position, or their chemical equivalents.
 16. The library of claim 15, which has a library of natural-variant combinatorial sequences at a combination of loops and loop lengths selected from loops BC and DE, BC and FG, and DE and FG loops, where the BC loop is selected from one of BC/11, BC/14, and BC/15, the DE loop is DE/6, and the FG loop is selected from one of FG/8, and FG11.
 17. The library of claim 15, which has at two of the loop combinations BC and DE, BC and FG, and DE and FG, beneficial mutations identified by screening a universal combinatorial library containing amino acid variants in the two loop combination, and at the third loop, identified by FG, DE, and BC, respectively, a library of natural variant combinatorial sequences at a third loop and lop length identified by BC/11, BC/14, and BC/15, DE/6, or FG/8, and FG11.
 18. The library method of claim 17, wherein each of the two selected loops have an average diversity of between 10⁵ and 10⁷.
 19. The library of claim 15, which has a given threshold is 100%, unless the loop amino acid position contains only one dominant and one variant amino, and the dominant and variant amino have side chains with similar physiochemical properties, in which case the given threshold is 90%.
 20. The library of claim 15, wherein said polypeptides have the wildtype amino acid sequences in beta-strand regions A, AB, B, C, CD, D, E, EF, F, and G of the 14^(th) fibronectin Type III module of human fibronecton.
 21. The library of claim 15, wherein said polypeptides have the wildtype amino acid sequences in regions A, AB, B, C, CD, D, E, EF, F, and G of the 10^(th) fibronectin Type III module of human fibronecton.
 22. The library of claim 15, wherein the BC loop length is 11, and has the amino acid sequence identified by SEQ ID NOS: 43 or
 49. 23. The library of claim 15, wherein the BC loop length is 14, and has the amino acid sequence identified by SEQ ID. NOS: 44 or
 50. 24. The library of claim 15, wherein the BC loop length is 15, and has the amino acid sequence identified by SEQ ID. NOS: 45 or
 51. 25. The library of claim 15, wherein the DE loop length is 6, and has the amino acid sequence identified by SEQ ID. NOS: 46 or
 52. 26. The library of claim 15, wherein the FG loop length is 8, and has the amino acid sequence identified by SEQ ID. NOS: 47, for the first N-terminal six amino acids, or SEQ ID NO:53.
 27. The library of claim 15, wherein FG loop length is 11, and has the amino acid sequence identified by SEQ ID. NO: 48, for the first N-terminal nine amino acids, or SEQ ID NO:54.
 28. The library of claim 15, wherein the polypeptides are encoded by an expression library selected from the group consisting of a ribosome display library, a polysome display library, a phage display library, a bacterial expression library, and a yeast display library.
 29. An expression library of polynucleotides encoding the library of polypeptides of claim 15, and produced by synthesizing polynucleotides encoding one or more beta-strand framework regions and one or more loop regions wherein the polynucleotides are predetermined, wherein the polynucleotides encoding said regions further comprise sufficient overlapping sequence whereby the polynucleotide sequences, under polymerase chain reaction (PCR) conditions, are capable of assembly into polynucleotides encoding complete fibronectin binding domains.
 30. A method of identifying a polypeptide having a desired binding affinity with respect to a selected antigen, comprising reacting the natural-variant combinatorial library of FN3 polypeptides of claim 15 with the selected antigen, and screening the FM3 polypeptides to select those having a desired binding affinity with respect to the selected antigen
 31. The method of claim 30, wherein the method further comprises the step of identifying the polynucleotide that encodes the selected fibronectin binding domain.
 32. A TNF-α binding protein having a K_(d) binding constant equal to or greaterthan 0.1 μM and having a sequence selected from SEQ ID NOS: 55-63.
 33. A VEGF binding protein having a K_(d) binding constant equal to or greater than 0.1 μM and having a sequence selected from SEQ ID NOS: 64-67.
 34. A HMGB1 binding protein having a K_(d) binding constant equal to or greater than 0.1 μM and having a sequence selected from SEQ ID NOS: 67-81. 